-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error while calculating 3D descriptors: missing 3D coordinate (RNCS/RNCG/AtomicCharge/Propc/AtomicSurfaceArea) #93
Comments
You have to provide the 3D structures if you want to calculate 3D descriptors with Mordred. I've found DataWarrior to be an exceptionally easy to use free software package for going from smiles to 2D and/or 3D structures for LARGE compound sets. Paul |
Hey, thank you for your reply, Paul. But how do you provide 3D molecules to Mordred using RDKit? |
Put your smiles in a text document (attachment #1). Open the text document in DWarrior. Save it (default save, as DWAR file). This is not essential, but makes future work easier, such as adding a name column. (attachment #2, after deleting the .txt extension because github does not allow much). Generate 2D atom coordinates ( Chemistry → Generate 2D Atom Coordinates…) Generate 3D structures (Chemistry → Generate Conformers…; Leave Max. Conformer Count = 1; DO NOT CHECK SAVE TO FILE). Save changes. Now save it as an SDF (File → Save Special → SD-File). A dialog pops up. Click "save" under "MDL SD-files (.sdf)." In the next dialog: (1) leave Structure Column as is; (2) change SD-file version to version 2 (Mordred may not like version 3 files); (3) change Atom Coordinates form 2D to "3D if available"; (4) choose you compound name column from the dropdown. Keep in mind program limitations (e.g. names with commas will give garbled output from Mordred if you choose csv file output). I used the smiles you provided as the compound name column. The new SDF contains 3D structures of your smiles. Note: your smiles did not specify stereoisomers, but a 3D structure requires such specificity. Data Warrior creates one stereoisomer per smiles in this case, as indicated by the usual "R" & "S" atom labels. The attached files include (1) your smiles in a text file (2) the DWar file ready to save as an SDF and (3) a pruned SDF of 3D structures from your smiles. Pruning involved deleting columns such as "minimization energy" & smiles. This is barely the beginning of what you can do using DataWarrior - I invite you to read the better-than-average (for free software) documentation with DataWarrior for the good stuff. Good Luck, Paul Remove the .txt extension on this file to get the DWAR file: Remove the .txt extention on this file to get the SDF: |
The SDF file above is your input file for Mordred, as in $ python -m mordred smiles_cartilage-ftw_pruned.sdf -o MORD_cartilage-ftw_pruned.csv You supply the output file name (after -o). In this example, I chose MORD_cartilage-ftw_pruned.csv |
An SDF serves as the input file for Mordred. It is entirely possible to generate 3D structures from smiles using RDKit. Personally, I find the DataWarrior GUI preferable, as it has extensive manipulation and visualization capabilities. In the case exemplified above, a combination using the rdkit.Chem.EnumerateStereoisomers module to generate all possible stereoisomer SMILES (saved in a text file) can then be imported (opened) in DataWarrior. Elimination of duplicates is trivial in DataWarrior (Data → Delete Rows → Duplicate Rows…). On another note, I do further molecular modeling for QM properties. I have found that DWar provides 3D structures that require less curation/refinement during geometry and energy minimization by semiempirical methods. Sometimes, unminimized 3D structures using torsions from the crystallographic database provide best starting structures. This is especially the case for sets of homologous molecular structures because the initial conformers retain more apparent homolog consistency. Of course, these are my personal experiences with molecule sets I have studied, versus either Open Babel or RDKit, so your mileage may vary. Paul |
Yeah Paul I tried this and it is missing all the 3D features still. It is supposed to have 1800+ features. Your SDF gives 1614 features only. |
Hi! Maybe it's toooo late to answer this. And for the questioner the problem may have been solved. IN mordred 1.2.0 you CAN NOT get all the descriptos by using code below:
what you should do is using code like this:
the whole sample code you can try is like this:
Thanks @ky66 a lot! I find this solution in his assay. |
Description
I'm trying to calculate a whole bunch of descriptors, including some 3D descriptors using a set of SMILES. It doesn't give me any values for RASA, TASA, TPSA, etc., just an error in place of the values "missing 3D coordinate".
Code
I am loading a bunch of smiles and making a list of RDKit Mol objects of the corresponding SMILES
molecules_list = []
After doing
I get the following output
The particular text output for RASA is,
In case you need some of my SMILES for reproducing this
Environment
OS/distribution
Manjaro KDE Plasma
Kernel: 5.8.18-1-MANJARO
conda or pip
Using conda (an environment called my-rdkit-env)
python version
Python 3.7.9
library version
Please execute the command and paste result.
conda
pip
(my-rdkit-env) [aayush@aayush-tuf ~]$ python -m pip list Package Version ------------------- ------------------- argon2-cffi 20.1.0 async-generator 1.10 attrs 20.2.0 backcall 0.2.0 bleach 3.2.1 certifi 2020.12.5 cffi 1.14.3 chemopy 1.0 cycler 0.10.0 decorator 4.4.2 defusedxml 0.6.0 entrypoints 0.3 importlib-metadata 2.0.0 ipykernel 5.3.4 ipython 7.18.1 ipython-genutils 0.2.0 ipywidgets 7.5.1 jedi 0.17.2 Jinja2 2.11.2 JPype1 1.1.2 jsonschema 3.2.0 jupyter 1.0.0 jupyter-client 6.1.7 jupyter-console 6.2.0 jupyter-core 4.6.3 jupyterlab-pygments 0.1.2 kiwisolver 1.3.0 MarkupSafe 1.1.1 matplotlib 3.3.2 mistune 0.8.4 mkl-fft 1.2.0 mkl-random 1.1.1 mkl-service 2.3.0 mordred 1.2.0 nbclient 0.5.1 nbconvert 6.0.7 nbformat 5.0.8 nest-asyncio 1.4.1 networkx 2.5 notebook 6.1.4 numpy 1.19.2 olefile 0.46 packaging 20.4 pandas 1.1.3 pandocfilters 1.4.2 parso 0.7.0 pexpect 4.8.0 pickleshare 0.7.5 Pillow 8.0.1 pip 20.3.1 prometheus-client 0.8.0 prompt-toolkit 3.0.8 ptyprocess 0.6.0 pycparser 2.20 Pygments 2.7.1 pyparsing 2.4.7 pyrsistent 0.17.3 python-dateutil 2.8.1 pytz 2020.4 pyzmq 19.0.2 qtconsole 4.7.7 QtPy 1.9.0 Send2Trash 1.5.0 setuptools 51.0.0.post20201207 six 1.15.0 terminado 0.9.1 testpath 0.4.4 tornado 6.1 tqdm 4.55.0 traitlets 5.0.5 typing-extensions 3.7.4.3 wcwidth 0.2.5 webencodings 0.5.1 wheel 0.36.2 widgetsnbextension 3.5.1 zipp 3.3.1
The text was updated successfully, but these errors were encountered: