Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while calculating 3D descriptors: missing 3D coordinate (RNCS/RNCG/AtomicCharge/Propc/AtomicSurfaceArea) #93

Open
cartilage-ftw opened this issue Jan 4, 2021 · 9 comments

Comments

@cartilage-ftw
Copy link

Description

I'm trying to calculate a whole bunch of descriptors, including some 3D descriptors using a set of SMILES. It doesn't give me any values for RASA, TASA, TPSA, etc., just an error in place of the values "missing 3D coordinate".

Code

I am loading a bunch of smiles and making a list of RDKit Mol objects of the corresponding SMILES molecules_list = []
After doing

desc_needed = ['SIC0', 'IC0', 'CIC0', 'nRot', 'nN', 'nH', 'nC', 'nS', 'nO', 'nHBDon', 'nHBAcc',
               'GeomDiameter', 'TopoPSA', 'SLogP', 'RASA', 'TASA', 'TPSA', 'RNCS', 'RPCS', 'RPSA']
calc = Calculator(descriptors, ignore_3D=False)
calc.descriptors = [d for d in calc.descriptors if str(d) in desc_needed]
result = calc.pandas(molecules_list)

I get the following output

https://i.imgur.com/vBwU12N.png

The particular text output for RASA is,

missing 3D coordinate (RNCS/RNCG/AtomicCharge/Propc/AtomicSurfaceArea)

In case you need some of my SMILES for reproducing this

C(C(CC(C(CO)O)O)=O)=O
C(CC(C(C(CO)O)O)=O)=O
C(C(C(CC(CO)O)=O)O)=O
C(CC(C=O)O)(C(CO)O)=O
C(C(C(C=O)O)O)(CCO)=O
C(C(C(CC(CO)=O)O)O)=O
C(C(C(C(C(C)=O)O)O)O)=O
C(CC(C(C(C=O)O)O)O)=O
C(CO)=O
C(C(C(CO)O)O)=O
C(CO)(C(C(C(CO)O)O)O)=O
C(C(C1C(C(C(O)O1)O)O)O)O
C(C1C(C(C(C(O)O1)O)O)O)O
C1C(C(C(C(C(O)O1)O)O)O)O

Environment

OS/distribution

Manjaro KDE Plasma
Kernel: 5.8.18-1-MANJARO

conda or pip

Using conda (an environment called my-rdkit-env)

python version

Python 3.7.9

library version

Please execute the command and paste result.

  • conda

    conda list
    # packages in environment at /home/aayush/miniconda3/envs/my-rdkit-env:
    #
    # Name                    Version                   Build  Channel
        _libgcc_mutex             0.1                        main  
    argon2-cffi               20.1.0           py37h7b6447c_1    anaconda
    async_generator           1.10             py37h28b3542_0    anaconda
    attrs                     20.2.0                     py_0    anaconda
    backcall                  0.2.0                      py_0    anaconda
    blas                      1.0                         mkl  
    bleach                    3.2.1                      py_0    anaconda
    bzip2                     1.0.8                h7b6447c_0  
    ca-certificates           2020.12.5            ha878542_0    conda-forge
    cairo                     1.14.12              h8948797_3  
    certifi                   2020.12.5        py37h89c1867_0    conda-forge
    cffi                      1.14.3           py37he30daa8_0    anaconda
    chemopy                   1.0                      pypi_0    pypi
    cycler                    0.10.0                   py37_0  
    dbus                      1.13.18              hb2f20db_0  
    decorator                 4.4.2                      py_0    anaconda
    defusedxml                0.6.0                      py_0    anaconda
    entrypoints               0.3                      py37_0    anaconda
    expat                     2.2.10               he6710b0_2  
    fontconfig                2.13.0               h9420a91_0  
    freetype                  2.10.4               h5ab3b9f_0  
    glib                      2.66.1               h92f7085_0  
    gst-plugins-base          1.14.0               hbbd80ab_1  
    gstreamer                 1.14.0               hb31296c_0  
    icu                       58.2                 he6710b0_3  
    importlib-metadata        2.0.0                      py_1    anaconda
    importlib_metadata        2.0.0                         1    anaconda
    intel-openmp              2020.2                      254  
    ipykernel                 5.3.4            py37h5ca1d4c_0    anaconda
    ipython                   7.18.1           py37h5ca1d4c_0    anaconda
    ipython_genutils          0.2.0                    py37_0    anaconda
    ipywidgets                7.5.1                      py_1    anaconda
    jedi                      0.17.2                   py37_0    anaconda
    jinja2                    2.11.2                     py_0    anaconda
    jpeg                      9b                   h024ee3a_2  
    jpype1                    1.1.2            py37hff7bd54_0  
    jsonschema                3.2.0                      py_2    anaconda
    jupyter                   1.0.0                    py37_7    anaconda
    jupyter_client            6.1.7                      py_0    anaconda
    jupyter_console           6.2.0                      py_0    anaconda
    jupyter_core              4.6.3                    py37_0    anaconda
    jupyterlab_pygments       0.1.2                      py_0    anaconda
    kiwisolver                1.3.0            py37h2531618_0  
    lcms2                     2.11                 h396b838_0  
    ld_impl_linux-64          2.33.1               h53a641e_7  
    libboost                  1.73.0              hf484d3e_11  
    libedit                   3.1.20191231         h14c3975_1  
    libffi                    3.3                  he6710b0_2  
    libgcc-ng                 9.1.0                hdf63c60_0  
    libpng                    1.6.37               hbc83047_0  
    libsodium                 1.0.18               h7b6447c_0    anaconda
    libstdcxx-ng              9.1.0                hdf63c60_0  
    libtiff                   4.1.0                h2733197_1  
    libuuid                   1.0.3                h1bed415_2  
    libxcb                    1.14                 h7b6447c_0  
    libxml2                   2.9.10               hb55368b_3  
    lz4-c                     1.9.2                heb0550a_3  
    markupsafe                1.1.1            py37h14c3975_1    anaconda
    matplotlib                3.3.2                h06a4308_0  
    matplotlib-base           3.3.2            py37h817c723_0  
    mistune                   0.8.4           py37h14c3975_1001    anaconda
    mkl                       2020.2                      256  
    mkl-service               2.3.0            py37he8ac12f_0  
    mkl_fft                   1.2.0            py37h23d657b_0  
    mkl_random                1.1.1            py37h0573a6f_0  
    mordred                   1.2.0              pyhe5148d4_0    mordred-descriptor
    nbclient                  0.5.1                      py_0    anaconda
    nbconvert                 6.0.7                    py37_0    anaconda
    nbformat                  5.0.8                      py_0    anaconda
    ncurses                   6.2                  he6710b0_1  
    nest-asyncio              1.4.1                      py_0    anaconda
    networkx                  2.5                        py_0  
    notebook                  6.1.4                    py37_0    anaconda
    numpy                     1.19.2           py37h54aff64_0  
    numpy-base                1.19.2           py37hfa32c7d_0  
    olefile                   0.46                     py37_0  
    openssl                   1.1.1i               h27cfd23_0  
    packaging                 20.4                       py_0    anaconda
    pandas                    1.1.3            py37he6710b0_0  
    pandoc                    2.11                 hb0f4dca_0    anaconda
    pandocfilters             1.4.2                    py37_1    anaconda
    parso                     0.7.0                      py_0    anaconda
    pcre                      8.44                 he6710b0_0  
    pexpect                   4.8.0                    py37_1    anaconda
    pickleshare               0.7.5                 py37_1001    anaconda
    pillow                    8.0.1            py37he98fc37_0  
    pip                       20.3.1           py37h06a4308_0  
    pixman                    0.40.0               h7b6447c_0  
    prometheus_client         0.8.0                      py_0    anaconda
    prompt-toolkit            3.0.8                      py_0    anaconda
    prompt_toolkit            3.0.8                         0    anaconda
    ptyprocess                0.6.0                    py37_0    anaconda
    py-boost                  1.73.0          py37h04863e7_11  
    pycparser                 2.20                       py_2    anaconda
    pygments                  2.7.1                      py_0    anaconda
    pyparsing                 2.4.7                      py_0  
    pyqt                      5.9.2            py37h05f1152_2  
    pyrsistent                0.17.3           py37h7b6447c_0    anaconda
    python                    3.7.9                h7579374_0  
    python-dateutil           2.8.1                      py_0  
    python_abi                3.7                     1_cp37m    conda-forge
    pytz                      2020.4             pyhd3eb1b0_0  
    pyzmq                     19.0.2           py37he6710b0_1    anaconda
    qt                        5.9.7                h5867ecd_1  
    qtconsole                 4.7.7                      py_0    anaconda
    qtpy                      1.9.0                      py_0    anaconda
    rdkit                     2020.09.1.0      py37hd50e099_1    rdkit
    readline                  8.0                  h7b6447c_0  
    send2trash                1.5.0                    py37_0    anaconda
    setuptools                51.0.0           py37h06a4308_2  
    sip                       4.19.8           py37hf484d3e_0  
    six                       1.15.0           py37h06a4308_0  
    sqlite                    3.33.0               h62c20be_0  
    terminado                 0.9.1                    py37_0    anaconda
    testpath                  0.4.4                      py_0    anaconda
    tk                        8.6.10               hbc83047_0  
    tornado                   6.1              py37h27cfd23_0  
    tqdm                      4.55.0             pyhd3eb1b0_0  
    traitlets                 5.0.5                      py_0    anaconda
    typing_extensions         3.7.4.3                    py_0    conda-forge
    wcwidth                   0.2.5                      py_0    anaconda
    webencodings              0.5.1                    py37_1    anaconda
    wheel                     0.36.2             pyhd3eb1b0_0  
    widgetsnbextension        3.5.1                    py37_0    anaconda
    xz                        5.2.5                h7b6447c_0  
    zeromq                    4.3.3                he6710b0_3    anaconda
    zipp                      3.3.1                      py_0    anaconda
    zlib                      1.2.11               h7b6447c_3  
    zstd                      1.4.5                h9ceee32_0
  • pip

        (my-rdkit-env) [aayush@aayush-tuf ~]$ python -m pip list
    Package             Version
    ------------------- -------------------
    argon2-cffi         20.1.0
    async-generator     1.10
    attrs               20.2.0
    backcall            0.2.0
    bleach              3.2.1
    certifi             2020.12.5
    cffi                1.14.3
    chemopy             1.0
    cycler              0.10.0
    decorator           4.4.2
    defusedxml          0.6.0
    entrypoints         0.3
    importlib-metadata  2.0.0
    ipykernel           5.3.4
    ipython             7.18.1
    ipython-genutils    0.2.0
    ipywidgets          7.5.1
    jedi                0.17.2
    Jinja2              2.11.2
    JPype1              1.1.2
    jsonschema          3.2.0
    jupyter             1.0.0
    jupyter-client      6.1.7
    jupyter-console     6.2.0
    jupyter-core        4.6.3
    jupyterlab-pygments 0.1.2
    kiwisolver          1.3.0
    MarkupSafe          1.1.1
    matplotlib          3.3.2
    mistune             0.8.4
    mkl-fft             1.2.0
    mkl-random          1.1.1
    mkl-service         2.3.0
    mordred             1.2.0
    nbclient            0.5.1
    nbconvert           6.0.7
    nbformat            5.0.8
    nest-asyncio        1.4.1
    networkx            2.5
    notebook            6.1.4
    numpy               1.19.2
    olefile             0.46
    packaging           20.4
    pandas              1.1.3
    pandocfilters       1.4.2
    parso               0.7.0
    pexpect             4.8.0
    pickleshare         0.7.5
    Pillow              8.0.1
    pip                 20.3.1
    prometheus-client   0.8.0
    prompt-toolkit      3.0.8
    ptyprocess          0.6.0
    pycparser           2.20
    Pygments            2.7.1
    pyparsing           2.4.7
    pyrsistent          0.17.3
    python-dateutil     2.8.1
    pytz                2020.4
    pyzmq               19.0.2
    qtconsole           4.7.7
    QtPy                1.9.0
    Send2Trash          1.5.0
    setuptools          51.0.0.post20201207
    six                 1.15.0
    terminado           0.9.1
    testpath            0.4.4
    tornado             6.1
    tqdm                4.55.0
    traitlets           5.0.5
    typing-extensions   3.7.4.3
    wcwidth             0.2.5
    webencodings        0.5.1
    wheel               0.36.2
    widgetsnbextension  3.5.1
    zipp                3.3.1
    (my-rdkit-env) [aayush@aayush-tuf ~]$     python -c 'import rdkit; print("rdkit " + rdkit.__version__)'
    rdkit 2020.09.1
@plkx
Copy link

plkx commented Jan 11, 2021

You have to provide the 3D structures if you want to calculate 3D descriptors with Mordred. I've found DataWarrior to be an exceptionally easy to use free software package for going from smiles to 2D and/or 3D structures for LARGE compound sets.

Paul

@cartilage-ftw
Copy link
Author

You have to provide the 3D structures if you want to calculate 3D descriptors with Mordred. I've found DataWarrior to be an exceptionally easy to use free software package for going from smiles to 2D and/or 3D structures for LARGE compound sets.

Paul

Hey, thank you for your reply, Paul. But how do you provide 3D molecules to Mordred using RDKit?
DataWarrior is great but I'm not sure if it will cover the entire list of descriptors I need to calculate.

@plkx
Copy link

plkx commented Jan 21, 2021

Put your smiles in a text document (attachment #1).

Open the text document in DWarrior.

Save it (default save, as DWAR file). This is not essential, but makes future work easier, such as adding a name column. (attachment #2, after deleting the .txt extension because github does not allow much).

Generate 2D atom coordinates ( Chemistry → Generate 2D Atom Coordinates…)

Generate 3D structures (Chemistry → Generate Conformers…; Leave Max. Conformer Count = 1; DO NOT CHECK SAVE TO FILE).
In this case, I chose the systematic, low energy bias algorithm; initial torsions from the crystallographic database; and minimized energy using MMFF94s+ forcefield. That took ~3 seconds on my laptop, but can take minutes or hours for large compound sets, e.g. recently took ~20 minutes for an 1800 cmpd set. You can choose not to minimize for fastest results.

Save changes.

Now save it as an SDF (File → Save Special → SD-File). A dialog pops up. Click "save" under "MDL SD-files (.sdf)." In the next dialog: (1) leave Structure Column as is; (2) change SD-file version to version 2 (Mordred may not like version 3 files); (3) change Atom Coordinates form 2D to "3D if available"; (4) choose you compound name column from the dropdown. Keep in mind program limitations (e.g. names with commas will give garbled output from Mordred if you choose csv file output). I used the smiles you provided as the compound name column.
(attachment #3, after deleting the .txt extension because github does not allow much).

The new SDF contains 3D structures of your smiles. Note: your smiles did not specify stereoisomers, but a 3D structure requires such specificity. Data Warrior creates one stereoisomer per smiles in this case, as indicated by the usual "R" & "S" atom labels.

The attached files include (1) your smiles in a text file (2) the DWar file ready to save as an SDF and (3) a pruned SDF of 3D structures from your smiles. Pruning involved deleting columns such as "minimization energy" & smiles.

This is barely the beginning of what you can do using DataWarrior - I invite you to read the better-than-average (for free software) documentation with DataWarrior for the good stuff.

Good Luck,

Paul

smiles_cartilage-ftw.txt

Remove the .txt extension on this file to get the DWAR file:
smiles_cartilage-ftw.dwar.txt

Remove the .txt extention on this file to get the SDF:
smiles_cartilage-ftw_pruned.sdf.txt

@plkx
Copy link

plkx commented Jan 21, 2021

The SDF file above is your input file for Mordred, as in

$ python -m mordred smiles_cartilage-ftw_pruned.sdf -o MORD_cartilage-ftw_pruned.csv

You supply the output file name (after -o). In this example, I chose MORD_cartilage-ftw_pruned.csv

@plkx
Copy link

plkx commented Jan 22, 2021

An SDF serves as the input file for Mordred.

It is entirely possible to generate 3D structures from smiles using RDKit.

Personally, I find the DataWarrior GUI preferable, as it has extensive manipulation and visualization capabilities.

In the case exemplified above, a combination using the rdkit.Chem.EnumerateStereoisomers module to generate all possible stereoisomer SMILES (saved in a text file) can then be imported (opened) in DataWarrior. Elimination of duplicates is trivial in DataWarrior (Data → Delete Rows → Duplicate Rows…).

On another note, I do further molecular modeling for QM properties. I have found that DWar provides 3D structures that require less curation/refinement during geometry and energy minimization by semiempirical methods. Sometimes, unminimized 3D structures using torsions from the crystallographic database provide best starting structures. This is especially the case for sets of homologous molecular structures because the initial conformers retain more apparent homolog consistency.

Of course, these are my personal experiences with molecule sets I have studied, versus either Open Babel or RDKit, so your mileage may vary.

Paul

@plkx
Copy link

plkx commented Jan 22, 2021

Window capture from DataWarrior with all 3D structures.
DWARCapture

@plkx
Copy link

plkx commented Jan 22, 2021

Oops - I moused over a structure that was not selected, causing the 3D structure above to be for the compound above the highlighted heptacyclo compound. Here is the a snip showing the same compound in all panes.
DWAR_Capture

@ky66
Copy link

ky66 commented Apr 12, 2021

Yeah Paul I tried this and it is missing all the 3D features still. It is supposed to have 1800+ features. Your SDF gives 1614 features only.

@xdn-github
Copy link

Hi! Maybe it's toooo late to answer this. And for the questioner the problem may have been solved.
But I would likely to write my solution for the problem to help new comers.

IN mordred 1.2.0 you CAN NOT get all the descriptos by using code below:

$ python -m mordred [molecular.sdf] -o [molecular_descriptors.csv]
you only get 1614 descriptors as ky66 mentioned, which excludes 3D descriptors.

what you should do is using code like this:

$ python -m mordred -3 [molecular.sdf] -o [molecular_descriptors.csv]
just add the "-3" in it! And you can get all descriptors result in [molecular_descriptors.csv]

the whole sample code you can try is like this:

from mordred import Calculator, descriptors
from rdkit import Chem
from rdkit.Chem import AllChem
import pandas as pd
import os

# using RDkit create 3D sdf file or you can uising DataWar as mentioned ahead
smiles = 'CCCC(=O)O[C@@H]1C[C@H]2C=CC[C@@H]21'
mol = AllChem.AddHs(Chem.MolFromSmiles(smiles))
AllChem.EmbedMolecule(mol)
AllChem.MMFFOptimizeMolecule(mol)
Chem.MolToMolFile(mol,  "test.sdf")

# create all descriptors using sdf file
os.system("python -m mordred -3 test.sdf -o test.csv")
# if you use jupyter the code ahead can be"!python -m mordred -3 test.sdf -o test.csv"

# you can check the descriptors files as blow in jupyter
df = pd.read_csv('test.csv')
df.head()

Thanks @ky66 a lot! I find this solution in his assay.
In the end, I think it's ok to close this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants