Fragmenstein

Description

NB. The class Monster was formerly called Fragmenstein

The Fragmenstein class places the followup placing algorithm. One problem in doing so is mapping atoms from the hits to the followup. Three modes were tested. The default (recommended) is with not prior merging of the hits.

The three modes rely heavily on mapping one-to-one atomic coordinates of overlapping atoms between hits within a 2Å radius (see Position Mapping code).

The three have different strengths. And which is better depends actually on the dataset. But most likely the no merging mode is like best for most datasets.

The argument merging_mode (full|partial|none|none_permissive|off) chooses which is used.

All three of these modes place the atoms of the followup and "project" the missing atoms.

Mapping modes

Full merging

The main documentation for this mode can be found here.

The inspiration hits are merged creating a new merged template and the followup is mapped onto this. This makes it much faster than the unmerged mode, so sliding scale of MCS mappings is done —a very strict MCS mapping is done, then a series of MCS ranging from very lax to more strict are done until one is found that supersets the strictest one.

This was the primary method used, until partial mapping was introduced to overcome the incorrect hit issue.

Pros

Obedient
Can be used to suggest new followups

Pro/Con

Code to stop mapping of unconnected hits (fuse) disabled/buggy, because of erroneous hits are a bit issue.

Cons

Slower than partial mapping (reason unclear)
Sensitive to too many and to incorrect hits (a COVID moonshot dataset problem)
Ring overlaps can result in really odd rings(∗)

∗ See work in progress for the ring collapsing code which aims to fix this.

Partial mapping

The main documentation for this mode can be found here.

As above but inconsistent hits are excluded. The "inconsistent" hits (dodgies in the code) are those who when seen in trio of hits do not map consistently. This speeds up the code and avoids the pitfalls of incorrect hits and too many hits.

Pros

Faster than full
Less prone to too many hits/wrong hits

Cons

often behaves differently than intended by ignoring good hits

No merging

This method (self standing code in core.unmerge_mapper.py) tries to maps each hit to the compound. The maps must satisfy some conditions, two mapped atoms cannot occupy the same 2 Å overlapping position and no bond in the map can be over 3 Å long —if the atoms in between are not placed there is no constraint (which is a problem). This mills through all possible MCS pairing of the various inspiration hits, so the time taken increases to the power of the hits.

The (slower) variant none_permissive allows the mapping of atoms of different kinds assuming the more permissive mapping includes the stricter.

Pros

Better at mapping
Better at discarding bad inspiration hits

Cons

Slower
Vulnerable to distant erroneous hits

Comparison

For a comparison see three modes compared.

Covalent

If the Chem.Mol has a dummy atom (element symbol: * within RDKit and smiles, but R in a mol file and PDB file) and a molecule with a single atom is passed to attachment argument, then the covalent linker if absent in the hits is anchored to that atom. The default dummy atom can be overridden with Fragmenstein.dummy:Chem.Mol and Fragmenstein.dummy_symbol:str.

Knows its past

The Chem.Mol object will have Chem.Atom objects with the RDKit property _Origin. This is a json stringified list of reference hit name dot atom index. fragmenstein.origin_from_mol(mol) will extract these for you.

Also there will be a field,_StDev, which is the average of the three axes of the standard deviation of the positions of the contributing atoms. fragmenstein.origin_from_stdev(mol) extracts them.

Projection

Many atoms may be novel and added to the followup.

Placing

The method called by the class place_from_map, placed the followup using those atoms.

To do a contrained embed in RDKit the reference atoms need to have a good geometry. Consequently, this is not possible. Therefore in the case of sidechains that are novel in the followup a optimised conformer is a superposed against the half placed followup using the 3-4 atoms that are the closest neighbours within the half-placed structure and the side chain position copied from there for each bit.

Imperfect projection

The projection approach is not perfect, but it is not constrained so generally gets fixed without issue during minimisation. This problem is quite apparent in the cases where atoms connecting to the sulfur are added:

The way the projection is done is via a single conformer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

monster.md

monster.md

Fragmenstein

Description

Mapping modes

Full merging

Pros

Pro/Con

Cons

Partial mapping

Pros

Cons

No merging

Pros

Cons

Comparison

Covalent

Knows its past

Projection

Placing

Imperfect projection

Files

monster.md

Latest commit

History

monster.md

File metadata and controls

Fragmenstein

Description

Mapping modes

Full merging

Pros

Pro/Con

Cons

Partial mapping

Pros

Cons

No merging

Pros

Cons

Comparison

Covalent

Knows its past

Projection

Placing

Imperfect projection