SMILES chemistry recognition #9

zbetmen1 · 2021-01-11T13:58:09Z

Hi Jianshu, great work as always 😃 . I have two questions:

How can I obtain SMILES data set you have trained the network on?
How do you deal with SMILES ambiguity in targets?
- Basically for the same chemical compound there are often many ways to encode the chemical in SMILES. What I'm wondering is how are images aligned with targets? For example, if network starts decoding from left to right it might produce one sequence of tokens, but if it starts decoding right to left it might produce other sequence of tokens. The thing is, both sequence of tokens, which are presumably different, may be completely correct. This is not the case when decoding Latex as the starting point is clear and images and targets are naturally aligned.

Provide feedback