Effect of Hydrogens and Kekulization on pKa Prediction #5

arazthexd · 2024-06-12T10:04:11Z

Congratulations on the great publication!

I was trying out your model and your code for a project of mine. I was looking to have a rough estimate of ratios of most common protomers of a molecule. I was planning on doing that using its predicted pKa values for each atom but the problem were molecules with more than one atom with different protonation states. While trying out QupKake I had some observations that made me doubt if it's possible to do with with it but I wanted to share the observations and hear your thoughts as well as if there could be a way to do this.

Basically, what brought me the doubt was that given different protonation states and also different SMILES formats (canonical and kekulized) the predictions were different. I'll show an example.

Consider the kekulized SMILES for eprosartan: 'CCCCC1=NC=C(\C=C(/CC2=CC=CS2)C(O)=O)N1CC1=CC=C(C=C1)C(O)=O'
When provided with this SMILES, this is how the output looks like:

basic:
idx=5: pka=6.281378, basic
idx=17: pka=6.023213, basic (?!)
acidic:
idx=17: pka=3.745246, acidic
idx=28: pka=3.870438, acidic

In the results above, everything looks reasonable except the basic pKa of atom 17 which should be much lower.

If the same molecule SMILES is provided without kekulization ('CCCCc1ncc(/C=C(\Cc2cccs2)C(=O)O)n1Cc1ccc(C(=O)O)cc1') the result would look as follows:

basic:
idx=5: pka=6.265716, basic
idx=18: pka=6.035862, basic (?!)
idx=27: pka=6.107231, basic (?!)
acidic:
idx=18: pka=3.744408, acidic
idx=27: pka=3.866692, acidic

It seems the pKa prediction module has a very low deviation from the previous results but I wonder why another carboxylic acid is enumerated as basic when input SMILES changes. I also wanted to ask why you think the model is predicting such high basic pKa values for carboxylic acid? I would be grateful to read your comments about it.

Now let's consider the same kekulized SMILES but with one of the carboxylic acids already ionized: CCCCC1=NC=C(\C=C(/CC2=CC=CS2)C(O)=O)N1CC1=CC=C(C=C1)C([O-])=O
Here is the result:

idx=5: pka=6.218657, basic
idx=17: pka=5.955811, basic (?!)
idx=28: pka=4.008273, basic
acidic:
idx=17: pka=3.568614, acidic

The prediction of atom 28 makes a lot of sense and is close to the acidic predicted pKa of it in the first results. What was somehow interesting to me was the drop in acidic pKa of atom 17 as I expected a rise because of the total charge of the molecule. Perhaps this is because such a molecule is somehow outside of the applicability domain of the model as I didn't see any already ionized molecules in the training data but I'm not sure if this is the case. If it is, it might be reasonable to neutralize the already ionized inputs before the predictions.

Another thing that caught my eye was that there was also different if the SMILES had explicit or implicit hydrogens which again, shouldn't matter I think.

arazthexd · 2024-06-12T13:51:43Z

Just another example for the effect of protonation states and a weird prediction would be this molecule:

When given the neutral form of the molecule, the pka of carboxylic group is predicted to be 3.96457 but when the amine group is protonated and positively charged in the input, pka of the carboxylic acid group is predicted to be 5.50159.

As I mentioned, it's understandable that the model was not trained on such data, but this seems to be an interesting trend I'm seeing in almost all examples and I would like to discuss about why it's behaving this way.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Effect of Hydrogens and Kekulization on pKa Prediction #5

Effect of Hydrogens and Kekulization on pKa Prediction #5

arazthexd commented Jun 12, 2024

arazthexd commented Jun 12, 2024

Effect of Hydrogens and Kekulization on pKa Prediction #5

Effect of Hydrogens and Kekulization on pKa Prediction #5

Comments

arazthexd commented Jun 12, 2024

arazthexd commented Jun 12, 2024