bblean.similarity#
Optimized molecular similarity calculators
Functions
iSIM Tanimoto, from sum of rows of a fingerprint array and number of rows |
|
Average Tanimoto, using iSIM |
|
Tanimoto similarity between packed fingerprints |
|
Finds two fps in a packed fp array that are the most Tanimoto-dissimilar |
|
Calculate the Tanimoto radius of a set of fingerprints |
|
Calculate the complement of the Tanimoto radius of a set of fingerprints |
|
Calculate the Tanimoto diameter of a set of fingerprints. |
|
Calculate the Tanimoto radius of a set of fingerprints |
|
Calculate the complement of the Tanimoto radius of a set of fingerprints |
|
Calculate the Tanimoto diameter of a set of fingerprints |
|
Calculates the majority vote centroid from a sum of fingerprint values |
|
Calculates the majority vote centroid from a set of fingerprints |
|
Calculate the (Tanimoto) medoid of a set of fingerprints, using iSIM |
|
Get all complementary (Tanimoto) similarities of a set of fps, using iSIM |
|
Sample from a set of fingerprints according to their complementary similarity |
|
Tanimoto similarity matrix between all pairs of packed fps in arr |
- bblean.similarity.jt_isim_from_sum(linear_sum, n_objects)[source]#
iSIM Tanimoto, from sum of rows of a fingerprint array and number of rows
iSIM Tanimoto was first propsed in: https://pubs.rsc.org/en/content/articlelanding/2024/dd/d4dd00041b
\(iSIM_{JT}(X)\) is an excellent \(O(N)\) approximation of the average Tanimoto similarity of a set of fingerprints.
Also equivalent to the complement of the Tanimoto diameter \(iSIM_{JT}(X) = 1 - D_{JT}(X)\).
- bblean.similarity.jt_isim(fps, input_is_packed=True, n_features=None)[source]#
Average Tanimoto, using iSIM
iSIM Tanimoto was first propsed in: https://pubs.rsc.org/en/content/articlelanding/2024/dd/d4dd00041b
\(iSIM_{JT}(X)\) is an excellent \(O(N)\) approximation of the average Tanimoto similarity of a set of fingerprints.
Also equivalent to the complement of the Tanimoto diameter \(iSIM_{JT}(X) = 1 - D_{JT}(X)\).
- bblean.similarity.jt_sim_packed(x, y)[source]#
Tanimoto similarity between packed fingerprints
Either both inputs are vectors of shape (F,) (Numpy scalar is returned), or one is an vector (F,) and the other an array of shape (N, F) (Numpy array of shape (N,) is returned).
- bblean.similarity.jt_most_dissimilar_packed(Y, n_features=None)[source]#
Finds two fps in a packed fp array that are the most Tanimoto-dissimilar
This is not guaranteed to find the most dissimilar fps, it is a robust O(N) approximation that doesn’t affect final cluster quality. First find centroid of Y, then find fp_1, the most dissimilar molecule to the centroid. Finally find fp_2, the most dissimilar molecule to fp_1
- Returns:
fp_1 (int) – index of the first fingerprint
fp_2 (int) – index of the second fingerprint
sims_fp_1 (np.ndarray) – Tanimoto similarities of Y to fp_1
sims_fp_2 (np.ndarray) – Tanimoto similarities of Y to fp_2
- Return type:
tuple[integer, integer, ndarray[tuple[Any, …], dtype[float64]], ndarray[tuple[Any, …], dtype[float64]]]
- bblean.similarity.jt_isim_radius_from_sum(ls, n)[source]#
Calculate the Tanimoto radius of a set of fingerprints
- bblean.similarity.jt_isim_radius_compl_from_sum(ls, n)[source]#
Calculate the complement of the Tanimoto radius of a set of fingerprints
- bblean.similarity.jt_isim_diameter_from_sum(ls, n)[source]#
Calculate the Tanimoto diameter of a set of fingerprints.
Equivalent to
1 - jt_isim_from_sum(ls, n)
- bblean.similarity.jt_isim_radius(arr, input_is_packed=True, n_features=None)[source]#
Calculate the Tanimoto radius of a set of fingerprints
- bblean.similarity.jt_isim_radius_compl(arr, input_is_packed=True, n_features=None)[source]#
Calculate the complement of the Tanimoto radius of a set of fingerprints
- bblean.similarity.jt_isim_diameter(arr, input_is_packed=True, n_features=None)[source]#
Calculate the Tanimoto diameter of a set of fingerprints
- bblean.similarity.centroid_from_sum(linear_sum, n_samples, *, pack=True)[source]#
Calculates the majority vote centroid from a sum of fingerprint values
The majority vote centroid is an good approximation of the Tanimoto centroid.
- bblean.similarity.centroid(fps, input_is_packed=True, n_features=None, *, pack=True)[source]#
Calculates the majority vote centroid from a set of fingerprints
The majority vote centroid is an good approximation of the Tanimoto centroid.
- bblean.similarity.jt_isim_medoid(fps, input_is_packed=True, n_features=None, pack=True)[source]#
Calculate the (Tanimoto) medoid of a set of fingerprints, using iSIM
Returns both the index of the medoid in the input array and the medoid itself
Note
Returns the first (or only) fingerprint for array of size 2 and 1 respectively. Raises ValueError for arrays of size 0
- bblean.similarity.jt_compl_isim(fps, input_is_packed=True, n_features=None)[source]#
Get all complementary (Tanimoto) similarities of a set of fps, using iSIM
- bblean.similarity.jt_stratified_sampling(fps, n_samples, input_is_packed=True, n_features=None)[source]#
Sample from a set of fingerprints according to their complementary similarity
Given a group of fingerprints, calculate all complementary similarities, order, and sample the first element from consecutive groups of length
num_fps // n_samples + 1.Note
This is not true statistical stratified sampling, it is not random, and the strata are not homogeneous. It is meant as a reliable, deterministic method to obtain a representative sample from a set of fingerprints.