Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

calculation of eSALI #3

Open
albertma1986 opened this issue Jun 8, 2023 · 3 comments
Open

calculation of eSALI #3

albertma1986 opened this issue Jun 8, 2023 · 3 comments

Comments

@albertma1986
Copy link

Hi,

I read the paper - "Exploring activity landscapes with extended similarity: is Tanimoto enough?"[https://onlinelibrary.wiley.com/doi/epdf/10.1002/minf.202300056]

I am trying to relate the code this repo and the equations mentioned in the paper, specifically this
image

is the calculate_counters() function in the condensed_version/MultComp.py responsible for getting the S e(M) value?

Sorry if I missed anything in the paper or in the docstrings but I cannot see a formula of how S e(M) is calculated or which code is responsible for this?

My task is simple, I am just trying to calculate the eSALI for my dataset, I have the numerical descriptors and the properties of the compounds.

Albert

@ramirandaq
Copy link
Member

Hi Albert, thanks a lot for your interest in our work and for reaching out to us. The calculate_counters function gives the main ingredients to then calculate the extended similarity (Se(M)), which then can be used in the eSALI formula. In this file https://github.com/ramirandaq/MultipleComparisons/blob/master/condensed_version/MultComp.py
we have an updated version of the formula. Please, notice two things:
1- Below line 117 there's a sample calculation of how to proceed to get the Se(M) value. Notice that, given a set of fingerprints arranged in a matrix (line 121), the first step is to calculate the sum of every column (line 130, this is the most time-demanding step of the whole process), then one needs to generate a data_sets instance (line 133) where one appends the number of fingerprints (n) to the vector with the sum of the columns. This is the main input needed to calculate the counters.
2- Once the counters are calculated, starting in line 144, it shows how to calculate several extended similarity indices. First, I strongly recommend only calculating the non-weighted version of the index (starting in line 179). Second, if you want to calculate the extended Tanimoto index, please see line 200 (although in several studies we've seen that the Russell-Rao index can give comparable, if not better, results, see line 204).
More importantly, please let us know if you have any other doubts/comments and if we can help with anything. If you want, we could send you a script with a more concise way to perform these calculations (this one is, purposely, very general, since we used as template for all the applications we are exploring in our group). If your dataset is too big, we also have more efficient ways to perform these calculations (although this one, as reported in the paper, already scales as O(N)).
All the best,
Ramon

@albertma1986
Copy link
Author

Hi Ramon, thanks so much for the explanation.
As far as I understand the extended similarity framework, it could be extended to other similarity (distance) metrics (for instance Euclidean distance if I have a set of compounds, each represented by a latent vector (not binary))

I am not a Math expert but I believe it would not make sense passing such latent vectors matrix to the calculate_counters() function (please correct me if I am wrong). Is there example around of calculating "extended Euclidean distance" (i.e. the denominator, 1- Se(M) but in a sense of Euclidean distance) in the formula.
image

Sorry if I am talking nonsense I am not sure if it is even doable.
Thanks
Albert

@ramirandaq
Copy link
Member

Hi, no problem! We don't have the extended Euclidean in this module. It'll be tricky to do this with Euclidean, but relatively easy to do with the square of the Euclidean distance. Basically, instead of using the "RMSD" using the "MSD", without the square root.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants