bblean.similarity#
Optimized molecular similarity calculators
Functions
iSIM Tanimoto, from sum of rows of a fingerprint array and number of rows |
|
Average Tanimoto, using iSIM |
|
Tanimoto similarity between a matrix of packed fps and a single packed fp |
|
Finds two fps in a packed fp array that are the most Tanimoto-dissimilar |
|
Calculate the Tanimoto radius of a set of fingerprints |
|
Calculate the complement of the Tanimoto radius of a set of fingerprints |
|
Calculate the Tanimoto diameter of a set of fingerprints. |
|
Calculate the Tanimoto radius of a set of fingerprints |
|
Calculate the complement of the Tanimoto radius of a set of fingerprints |
|
Calculate the Tanimoto diameter of a set of fingerprints |
|
Calculates the majority vote centroid from a sum of fingerprint values |
|
Calculates the majority vote centroid from a set of fingerprints |
|
Calculate the (Tanimoto) medoid of a set of fingerprints, using iSIM |
|
Get all complementary (Tanimoto) similarities of a set of fps, using iSIM |
|
|
|
Tanimoto similarity matrix between all pairs of packed fps in arr |
- bblean.similarity.jt_isim_from_sum(linear_sum, n_objects)[source]#
iSIM Tanimoto, from sum of rows of a fingerprint array and number of rows
iSIM Tanimoto was first propsed in: https://pubs.rsc.org/en/content/articlelanding/2024/dd/d4dd00041b
\(iSIM_{JT}(X)\) is an excellent \(O(N)\) approximation of the average Tanimoto similarity of a set of fingerprints.
Also equivalent to the complement of the Tanimoto diameter \(iSIM_{JT}(X) = 1 - D_{JT}(X)\).
- bblean.similarity.jt_isim(fps, input_is_packed=True, n_features=None)[source]#
Average Tanimoto, using iSIM
iSIM Tanimoto was first propsed in: https://pubs.rsc.org/en/content/articlelanding/2024/dd/d4dd00041b
\(iSIM_{JT}(X)\) is an excellent \(O(N)\) approximation of the average Tanimoto similarity of a set of fingerprints.
Also equivalent to the complement of the Tanimoto diameter \(iSIM_{JT}(X) = 1 - D_{JT}(X)\).
- bblean.similarity.jt_sim_packed(arr, vec)[source]#
Tanimoto similarity between a matrix of packed fps and a single packed fp
- bblean.similarity.jt_most_dissimilar_packed(Y, n_features=None)[source]#
Finds two fps in a packed fp array that are the most Tanimoto-dissimilar
This is not guaranteed to find the most dissimilar fps, it is a robust O(N) approximation that doesn’t affect final cluster quality. First find centroid of Y, then find fp_1, the most dissimilar molecule to the centroid. Finally find fp_2, the most dissimilar molecule to fp_1
- Returns:
fp_1 (int) – index of the first fingerprint
fp_2 (int) – index of the second fingerprint
sims_fp_1 (np.ndarray) – Tanimoto similarities of Y to fp_1
sims_fp_2 (np.ndarray) – Tanimoto similarities of Y to fp_2
- Return type:
tuple[integer, integer, ndarray[tuple[Any, …], dtype[float64]], ndarray[tuple[Any, …], dtype[float64]]]
- bblean.similarity.jt_isim_radius_from_sum(ls, n)[source]#
Calculate the Tanimoto radius of a set of fingerprints
- bblean.similarity.jt_isim_radius_compl_from_sum(ls, n)[source]#
Calculate the complement of the Tanimoto radius of a set of fingerprints
- bblean.similarity.jt_isim_diameter_from_sum(ls, n)[source]#
Calculate the Tanimoto diameter of a set of fingerprints.
Equivalent to
1 - jt_isim_from_sum(ls, n)
- bblean.similarity.jt_isim_radius(arr, input_is_packed=True, n_features=None)[source]#
Calculate the Tanimoto radius of a set of fingerprints
- bblean.similarity.jt_isim_radius_compl(arr, input_is_packed=True, n_features=None)[source]#
Calculate the complement of the Tanimoto radius of a set of fingerprints
- bblean.similarity.jt_isim_diameter(arr, input_is_packed=True, n_features=None)[source]#
Calculate the Tanimoto diameter of a set of fingerprints
- bblean.similarity.centroid_from_sum(linear_sum, n_samples, *, pack=True)[source]#
Calculates the majority vote centroid from a sum of fingerprint values
The majority vote centroid is an good approximation of the Tanimoto centroid.
- bblean.similarity.centroid(fps, input_is_packed=True, n_features=None, *, pack=True)[source]#
Calculates the majority vote centroid from a set of fingerprints
The majority vote centroid is an good approximation of the Tanimoto centroid.
- bblean.similarity.jt_isim_medoid(fps, input_is_packed=True, n_features=None, pack=True)[source]#
Calculate the (Tanimoto) medoid of a set of fingerprints, using iSIM
Returns both the index of the medoid in the input array and the medoid itself
Note
Returns the first (or only) fingerprint for array of size 2 and 1 respectively. Raises ValueError for arrays of size 0