Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed: allow mutiple scissions in the atom mapping process. #701

Merged
merged 24 commits into from
Oct 3, 2023

Conversation

kfir4444
Copy link
Collaborator

This PR presents a new method of performing the scission in atom mapping, based on the atom indices, which are required for atom mapping anyway. This method also simplify the process.
Tests were also added.

@codecov
Copy link

codecov bot commented Sep 22, 2023

Codecov Report

Merging #701 (5bfeaac) into main (5c3aab6) will decrease coverage by 0.01%.
The diff coverage is 93.70%.

@@            Coverage Diff             @@
##             main     #701      +/-   ##
==========================================
- Coverage   73.30%   73.29%   -0.01%     
==========================================
  Files          99       99              
  Lines       26609    26557      -52     
  Branches     5563     5537      -26     
==========================================
- Hits        19506    19466      -40     
+ Misses       5759     5744      -15     
- Partials     1344     1347       +3     
Flag Coverage Δ
unittests 73.29% <93.70%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
arc/mapping/engine_test.py 99.56% <100.00%> (-0.01%) ⬇️
arc/mapping/driver.py 70.27% <71.42%> (-1.29%) ⬇️
arc/reaction_test.py 99.33% <86.66%> (-0.26%) ⬇️
arc/mapping/engine.py 88.25% <90.00%> (+0.26%) ⬆️

... and 3 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Member

@alongd alongd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Please see some comments

Can you relocate the added tests to right after the respective commit that added the function being tested? This way it is easier to review a function and its related tests and see the relevance of the new tests

@@ -35,7 +36,15 @@ def setUpClass(cls):
"""
cls.maxDiff = None
cls.rmgdb = rmgdb.make_rmg_database_object()
rmgdb.load_families_only(cls.rmgdb)
cls.rmgdb.load(path=rmg_settings['database.directory'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should try to load through ARC's rmgdb module, so we guarantee that during an actual run ARC's also able to load all families, not only during the tests.

Copy link
Collaborator Author

@kfir4444 kfir4444 Sep 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an issue with that correctly. Looking for replays on that issue so I could proceed with fixing it. This is a temporary patch.

@@ -1654,6 +1654,23 @@ def test_get_atom_map(self):
self.assertIn(atom_map[int(atom.label)], c_symmetry_h_2 if atom.symbol == "C" else h_symmetry2)
self.assertTrue(check_atom_map(rxn=rxn))

# XY_elimination_hydroxyl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have another PR that organizes these tests. We should separate this large test into smaller unit tests, so we know which ones fail, and we can run them individually when debugging.
Please put this added test in a new unit test function

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you point me to the PR?
Also, we should think about performing some of these randomly, they are getting out of hand in terms of computation time..

arc/mapping/engine_test.py Show resolved Hide resolved
arc/mapping/engine_test.py Show resolved Hide resolved
spc = ARCSpecies(label="ethane", smiles="CC")
label_species_atoms([spc])
self.assertTrue(determine_bdes_indices_based_on_atom_labels(spc, (1, 2)))
self.assertFalse(determine_bdes_indices_based_on_atom_labels(spc, (2, 3)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this line a dependent test? if the above is True, then this line must be false

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. If (starting from 1, due to bde convention) atom number 3 (hydrogen atom) was connected to atom number 2 (the second carbon atom) then this would have been true.

arc/mapping/engine.py Outdated Show resolved Hide resolved
species += [H1, H2]
else:
try:
species += candidate.scissors()
Copy link
Member

@alongd alongd Sep 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docsting says we return the species after scission, but looks like we add it to the species list, and the original species is also returned (also above when treating H2). Is this on purpose? If so, need to update the docstring

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The species are returned after all required scissoring is done. If none were requested, then no scissoring are done. It does return them after scission, so I do not understand the confusion.

arc/mapping/engine.py Outdated Show resolved Hide resolved
arc/mapping/engine.py Show resolved Hide resolved
arc/mapping/engine.py Outdated Show resolved Hide resolved
@kfir4444 kfir4444 force-pushed the am_fixup_multi_cut branch 4 times, most recently from f6807cb to bcec4f0 Compare September 26, 2023 11:31
arc/reaction_test.py Fixed Show fixed Hide fixed
@alongd
Copy link
Member

alongd commented Sep 30, 2023

I tried to map the following rxn on this branch
C=CC([O])[CH]C=CCC <=> CCC=C[CH]CC=C[O]

and got an error:

IndexError                                Traceback (most recent call last)
/tmp/ipykernel_225830/1280115450.py in <module>
----> 1 print(rxn1.atom_map)

~/Code/ARC/arc/reaction.py in atom_map(self)
    156                 and all(species.get_xyz(generate=False) is not None for species in self.r_species + self.p_species):
    157             for backend in ["ARC", "QCElemental"]:
--> 158                 _atom_map = map_reaction(rxn=self, backend=backend)
    159                 if _atom_map is not None:
    160                     self._atom_map = _atom_map

~/Code/ARC/arc/mapping/driver.py in map_reaction(rxn, backend, db, flip)
     71             return _map if _map is not None else map_reaction(rxn, backend=backend, db=db, flip=True)
     72         try:
---> 73             _map = map_rxn(rxn, backend=backend, db=db)
     74         except ValueError as e:
     75             return map_reaction(rxn, backend=backend, db=db, flip=True)

~/Code/ARC/arc/mapping/driver.py in map_rxn(rxn, backend, db)
    237     rmg_reactions = get_rmg_reactions_from_arc_reaction(arc_reaction=rxn, backend=backend)
    238     r_label_dict, p_label_dict = get_atom_indices_of_labeled_atoms_in_an_rmg_reaction(arc_reaction=rxn,
--> 239                                                                                       rmg_reaction=rmg_reactions[0])
    240 
    241     # step 2:

IndexError: list index out of range

Can you take a look?

@alongd
Copy link
Member

alongd commented Sep 30, 2023

BTW, I think this is the correct mapping:
image

@kfir4444
Copy link
Collaborator Author

C=CC([O])[CH]C=CCC <=> CCC=C[CH]CC=C[O]

Did it previously work on the main branch? I think I know the issue, and I did not change the function that returns a wrong value.
Also, I this this issue probably stem from the fact that get_rmg_reactions_from_arc_reaction(arc_reaction=rxn, backend=backend) for this reaction returns an empty list.
Also, could you share the full script, and maybe also check that the reaction family is found correctly?

This function verifies that the BDE's are supposed to fit the relevant species. Also, checks that the bond exist between the relevant atoms.
This function uses the new, faster and more accurate method for performing the scissoring for atom mapping. Instead of assigning the BDE's to the relevant species, which causes issues when further scissions are performed, this function uses the atom indices to perform the scission.
This due to issue #675, this function also moved the atom indices when copying.
To facilitate the scission without assigning BDE's to the species. Also much simpler than the older function.
@kfir4444 kfir4444 force-pushed the am_fixup_multi_cut branch from 99c9e12 to 5bfeaac Compare October 2, 2023 11:13
Copy link
Member

@alongd alongd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@alongd alongd merged commit 3aceebc into main Oct 3, 2023
6 of 7 checks passed
@alongd alongd deleted the am_fixup_multi_cut branch October 3, 2023 05:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants