🐛 Bug: eos5qfo - Predict Error: products #371

Chigoziee · 2022-10-17T20:17:41Z

Describe the bug.

I successfully fetched and serve my model 'eos5qfo' but when i tried to test it using the 'eml_canonical.csv' test dataset it was stuck in a loop for a very long time. i restarted my CL and my internet connection but that did not seem to help.
during fetching the model, it took a lot of time so i assumed it was one of those large models and that it would take time to finish predictions so i decided to test in on a smaller data. i tested it on "CCCCC" and "CN(C)CCC=C1c2ccccc2CCc3ccccc13" and they gave me an AssertionError in the log.
eos5qfo_predict_log.log

Describe the steps to reproduce the behavior

No response

Expected behavior.

No response

Screenshots.

No response

Operating environment

windows 10 WLS ubuntu 20.04

Additional context

No response

GemmaTuron · 2022-10-18T09:00:13Z

Hi @Chigoziee

Thanks, this is helpful. The error is due to the input format type, which requires two molecules, you can check the Ersilia Model Hub website and read what is this model doing Utilizes a Weisfeiler-Lehman network (attentive mechanism) to predict the products of an organic reaction given the reactants. The model identifies the reaction centers (set of atoms/bonds that change from reactant to product) and obtains the products directly from a graph-based neural network.
It sounds complex, but the key part here is Predict the products of an organic reaction given the reactants
This means you need to pass TWO Molecules or two lists of molecules. Read more about it here: https://ersilia.gitbook.io/ersilia-book/ersilia-model-hub/inputs

can you create a CSV file with two columns (you can just have three molecules for column) and test it?

Let me link here @mamabear25 for issue #367 as she is getting problems in fetching the model

Chigoziee · 2022-10-18T16:04:01Z

Hi @GemmaTuron i created a csv called "test" with two inputs and tested it with the model, this gave me a "KeyError: 'products'" error, i have attached the test input data and also the log
eos5qfo_log.log
test.csv

GemmaTuron · 2022-10-19T17:19:43Z

Hi @Chigoziee !

Thanks for the test.
Two things:

MY FAULT: I gave you the wrong input format. A trick we have in the Hub is to automatically generate a template with the shape that the model needs, the command is: ersilia example eos5qfo -n 1 -f myexample.csv (it will create a my example file). You will see SMILES separated by a .
BAD NEWS: the correct input file still fails with same error
We will have a look at this.
Please change the issue title to:
"eos5qfo - Predict Error: products"
And mark it in the excel file

Chigoziee · 2022-10-24T14:33:42Z

Done

DhanshreeA · 2022-12-27T06:53:06Z

I am able to reproduce this error even by generating an example file in the correct input format as suggested by @GemmaTuron. This is similar to the issue I am facing while trying to run the generate API in the recently incorporated model eos4q1a

DhanshreeA · 2023-01-04T21:56:27Z

Hi @GemmaTuron and @miquelduranfrigola!

I have been debugging this for a while and here are my findings so far:

By default, the model dumps a JSON file and not a csv, although in principle ersilia should be able to generate output in any given format.
When testing the model with ouptut file specified as a JSON, it works just fine and I able to run it with single/multiple list inputs
The following schema is being saved for this model during fetch time:

{
    "predict": {
        "input": {
            "key": {
                "type": "string"
            },
            "text": {
                "type": "string"
            }
        },
        "output": {}
    }
}

Because while generating output in CSV, the GenericOutputAdapter in ersilia/io/output.py scans this schema and finds no product key, the CLI runs into KeyError: 'products'

I tried updating the meta in service.py to

        meta = {
            "products": []
        }
        result = {
            "result": R,
            "meta": meta
        }

However, that seems to have no effect on the generated schema.

Moreover, from the fetch logs, I see errors about data type consistency

01:41:44 | DEBUG    | Server is ready. Trying to get URL
01:41:44 | DEBUG    | URL found: http://127.0.0.1:33999/
01:41:44 | DEBUG    | Iterating over APIs
01:41:44 | DEBUG    | Running API: predict
01:41:44 | DEBUG    | [['CC1C2C(CC3(C=CC(=O)C(=C3C2OC1=O)C)C)O', 'C1=CN=CC=C1C(=O)NN'], ['CC(=O)OC1=CC=CC=C1C(=O)O', 'CC(C)CC1=CC=C(C=C1)C(C)C(=O)O', 'CC1(OC2C(OC(C2O1)(C#N)C3=CC=C4N3N=CN=C4N)CO)C']]
01:41:44 | DEBUG    | API: predict
01:41:44 | DEBUG    | MODEL ID: eos5qfo
01:41:44 | DEBUG    | SERVICE URL: http://127.0.0.1:33999/
01:41:44 | DEBUG    | Reading card from eos5qfo
01:41:44 | DEBUG    | Reading shape from eos5qfo
01:41:44 | DEBUG    | Input Shape: List
01:41:44 | DEBUG    | Input type is: compound
01:41:44 | DEBUG    | Input shape is: List
01:41:44 | DEBUG    | Importing module: .types.compound
01:41:44 | DEBUG    | Checking RDKIT and other requirements necessary for compound inputs
01:41:44 | DEBUG    | InputShapeList shape: List
01:41:44 | DEBUG    | API eos5qfo:predict initialized at URL http://127.0.0.1:33999/
01:41:44 | DEBUG    | Schema not yet available
01:41:44 | INFO     | No empty output available
01:41:44 | DEBUG    | Meta: None
01:41:44 | DEBUG    | Posting to predict
01:41:44 | DEBUG    | Batch size 100
01:41:44 | DEBUG    | Schema not yet available
01:41:50 | DEBUG    | Status code: 200
01:41:50 | DEBUG    | Schema not yet available
01:41:50 | DEBUG    | Done with unique posting
01:41:50 | DEBUG    | Metadata needs to be calculated
01:41:50 | DEBUG    | [{'input': {'key': 'f714e0b82911ab6ae92add25326be299', 'input': ['CC1C2C(CC3(C=CC(=O)C(=C3C2OC1=O)C)C)O', 'C1=CN=CC=C1C(=O)NN'], 'text': 'CC1C2C(CC3(C=CC(=O)C(=C3C2OC1=O)C)C)O.C1=CN=CC=C1C(=O)NN'}, 'output': {'products': ['CC1=C2C3OC(=O)C(C)C3C(O)CC2(C)C=CC1=NNC(=O)c1ccncc1', 'O']}}, {'input': {'key': '4497e531f523ced31965305c4f0929bc', 'input': ['CC(=O)OC1=CC=CC=C1C(=O)O', 'CC(C)CC1=CC=C(C=C1)C(C)C(=O)O', 'CC1(OC2C(OC(C2O1)(C#N)C3=CC=C4N3N=CN=C4N)CO)C'], 'text': 'CC(=O)OC1=CC=CC=C1C(=O)O.CC(C)CC1=CC=C(C=C1)C(C)C(=O)O.CC1(OC2C(OC(C2O1)(C#N)C3=CC=C4N3N=CN=C4N)CO)C'}, 'output': {'products': ['CC(=O)Oc1ccccc1C(=O)O', 'CC(C)Cc1ccc(C(C)C(=O)O)cc1', 'CC1(C)OC2CCOC(C#N)(c3ccc4c(N)ncnn34)C2O1', 'O']}}]
01:41:50 | ERROR    | Input data types are not consistent
01:41:50 | ERROR    | Output data types are not consistent
01:41:50 | DEBUG    | Latest meta: {'products': []}
01:41:50 | DEBUG    | Schema: {'input': {'key': {'type': 'string'}, 'text': {'type': 'string'}}, 'output': {}}
01:41:50 | DEBUG    | {'input': {'key': {'type': 'string'}, 'text': {'type': 'string'}}, 'output': {}}
01:41:50 | DEBUG    | API schema saved at /home/dee/eos/dest/eos5qfo/api_schema.json
01:41:50 | INFO     | Fetching eos5qfo done successfully: 0:01:21.756536

My guess is that this loop is not being executed at all, and possibly because of this datatype inconsistency error that's not letting output_schema_ dict being populated.

DhanshreeA · 2023-01-04T21:59:33Z

If there's no strong reason to dump the output of this model as JSON, we can just use CSVs and like with other models and see if that works? What do you think @miquelduranfrigola?

miquelduranfrigola · 2023-01-16T15:32:00Z

Hi @DhanshreeA, apologies for a slow response here. Thanks for going deeply into the code. 🤗

I have been thinking of robust ways of serializing any JSON output into a CSV table. One option would be to consider the JSONL format. In that case, our output adapter would:

Try to convert the JSON content to multiple columns in a CSV file.
In case this doesn't work, simply store a one-column CSV file with the JSON string as the value in that column.

Obviously, we would need to be careful when the length of the output is variable. Perhaps we can think of of flexible-width CSV output, where the number of columns is equivalent to the maximum length across the output?

Please let me know your thoughts and perhaps we can open a Feature Request to tackle this issue beyond this model?

carcablop · 2023-01-17T00:09:44Z

Hello @miquelduranfrigola and @DhanshreeA
A similar problem occurs with the eos526j model #374 (comment), and this https://github.com/ersilia-os/ersilia/issues/377. I would like to be able to work together with you on the output adapter and try what @miquelduranfrigola is proposing to also give a solution to the eos526j model. In the case of the eos526j model, the size of the output is variable, we have an input (molecule) that breaks down into different precursors.

miquelduranfrigola · 2023-01-17T22:09:28Z

As reflected above, we are now moving this discussion to #560 . As soon as we finish this, we will come back to this model and hopefully finish it.

ana42742 · 2023-03-09T21:50:35Z

Hey everyone! I have been trying to fetch this model for hours, but nothing seems to be working. I own a macOS ARM pc. Seems like the numpy version necessary for this model is 1.19.2, which only works with python 3.7.0. Unfortunately, conda installs for python 3.7 are not available (conda has been recommended in the docs). Could someone please help me with this? #612 (comment)

GemmaTuron · 2023-03-24T14:38:00Z

Hi @ana42742 !

Sorry, I missed that comment here. We are aware of the issues with MacBook systems they have just recently stopped support for py3.7
We are deciding how to tackle this

AhmedYusuff · 2023-03-25T14:14:34Z

Hello @GemmaTuron.

The Model was Fetched, Served and Predicted Successfully on Google Colab.

Since the Model requires TWO molecules to give prediction I tested the Model using an input file generated by running. ersilia example eos5qfo -n 1 -f myexample.csv as suggested.
And the Product predictions of this SMILES were successful.

rnewout.csv

I also tested the model with "eml_canonical" SMILES file without editing the format, and the prediction was also successful and I got the Products of the molecules.

newout.csv

GemmaTuron · 2023-03-27T06:01:31Z

Hi @AhmedYusuff !

That's great, we can then close off this issue!

Chigoziee added the bug Something isn't working label Oct 17, 2022

GemmaTuron added help wanted Extra attention is needed model-bug labels Oct 19, 2022

Chigoziee changed the title ~~🐛 Bug: 'Model predict' caught in an endless loop~~ 🐛 Bug: eos5qfo - Predict Error: products Oct 20, 2022

DhanshreeA self-assigned this Dec 15, 2022

DhanshreeA mentioned this issue Dec 27, 2022

🦠 Model Request: CReM - chemically reasonable mutations framework for structure generation #505

Closed

miquelduranfrigola mentioned this issue Jan 17, 2023

Output adapter serialize JSON to CSV #560

Closed

GemmaTuron mentioned this issue Mar 9, 2023

✍️ Contribution period: Ananya Srivastava #612

Closed

13 tasks

GemmaTuron mentioned this issue Mar 24, 2023

✍️ Contribution period: <Ahmed Yusuf> #615

Closed

13 tasks

GemmaTuron closed this as completed Mar 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 Bug: eos5qfo - Predict Error: products #371

🐛 Bug: eos5qfo - Predict Error: products #371

Chigoziee commented Oct 17, 2022 •

edited

Loading

GemmaTuron commented Oct 18, 2022

Chigoziee commented Oct 18, 2022

GemmaTuron commented Oct 19, 2022

Chigoziee commented Oct 24, 2022

DhanshreeA commented Dec 27, 2022

DhanshreeA commented Jan 4, 2023 •

edited

Loading

DhanshreeA commented Jan 4, 2023

miquelduranfrigola commented Jan 16, 2023

carcablop commented Jan 17, 2023 •

edited

Loading

miquelduranfrigola commented Jan 17, 2023

ana42742 commented Mar 9, 2023 •

edited

Loading

GemmaTuron commented Mar 24, 2023

AhmedYusuff commented Mar 25, 2023 •

edited

Loading

GemmaTuron commented Mar 27, 2023

🐛 Bug: eos5qfo - Predict Error: products #371

🐛 Bug: eos5qfo - Predict Error: products #371

Comments

Chigoziee commented Oct 17, 2022 • edited Loading

Describe the bug.

Describe the steps to reproduce the behavior

Expected behavior.

Screenshots.

Operating environment

Additional context

GemmaTuron commented Oct 18, 2022

Chigoziee commented Oct 18, 2022

GemmaTuron commented Oct 19, 2022

Chigoziee commented Oct 24, 2022

DhanshreeA commented Dec 27, 2022

DhanshreeA commented Jan 4, 2023 • edited Loading

DhanshreeA commented Jan 4, 2023

miquelduranfrigola commented Jan 16, 2023

carcablop commented Jan 17, 2023 • edited Loading

miquelduranfrigola commented Jan 17, 2023

ana42742 commented Mar 9, 2023 • edited Loading

GemmaTuron commented Mar 24, 2023

AhmedYusuff commented Mar 25, 2023 • edited Loading

GemmaTuron commented Mar 27, 2023

Chigoziee commented Oct 17, 2022 •

edited

Loading

DhanshreeA commented Jan 4, 2023 •

edited

Loading

carcablop commented Jan 17, 2023 •

edited

Loading

ana42742 commented Mar 9, 2023 •

edited

Loading

AhmedYusuff commented Mar 25, 2023 •

edited

Loading