Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/dataSource API call returns wrong results about TAX IDs #653

Open
marco-brandizi opened this issue Jun 30, 2022 · 4 comments
Open

/dataSource API call returns wrong results about TAX IDs #653

marco-brandizi opened this issue Jun 30, 2022 · 4 comments
Labels
bug project:client Related to the client/front-end war. project:web service Knetminer web service (ws), including search, JSON out, data export, config reading.

Comments

@marco-brandizi
Copy link
Member

marco-brandizi commented Jun 30, 2022

This API call is going to be dismissed and replaced by the new /dataset-info API. Meanwhile, I need to clarify the following bit, in order to provide a decent new implementation for legacy clients:

    dataService.getTaxIds ().forEach( taxID -> {
       summaryJSON.put("speciesTaxid", taxID);
    });

If I get the original author's intention right, this should yield a JSON fragment with one speciesTaxId per available species, something like:

{
  ...
  speciesTaxid: 123,
  speciesTaxid: 456,
  ...
}

However, the code above doesn't produce this output at all, since summaryJSON is a Map and its speciesTaxid is overwritten at each iteration of that loop-like function. In other words, the output reports only one of the available species, and which one exactly is essentially random.

I don't plan to fix this gross and irritating bug, rather, the goal of this ticket is investigating which clients use this API, understanding how they use it and seeing how they can be changed to switch to the new /dataset-info (which has a different structure, but returns the specie details correctly).

As a secondary goal, if we really need to keep this interface alive (eg, due to external clients), we need to understand if this can reasonably be done without returning plain wrong data (eg, is the first specie listed in a configuration the most significant, a default, or alike?).

The speciesName included in this old API's output has a similar problem: do the client expect the latin or common name here?

@marco-brandizi marco-brandizi added bug project:web service Knetminer web service (ws), including search, JSON out, data export, config reading. project:client Related to the client/front-end war. labels Jun 30, 2022
@KeywanHP
Copy link
Member

KeywanHP commented Jul 4, 2022

I am not aware of any external clients using the /dataSource API. It is used internally to retrieve basic dataset information when networks are saved to knetspace.

We are flexible to refactor and improve it as we like. The API describes a knetminer dataset: which species it includes (ids and latin names), which taxids have been indexed, creator and release information etc. All this information can be generated in our automated ETL pipeline along with the OXL (in future Neo4j) and graph index files.

@Arnedeklerk
Copy link
Member

Moved to 5.7

@marco-brandizi
Copy link
Member Author

@KeywanHP This has been here for long time. I propose to remove the call to /dataSource for 5.7, we're progressively moving into using /dataset-info.

@KeywanHP
Copy link
Member

Agree - ok to remove /dataSource

marco-brandizi added a commit that referenced this issue Sep 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug project:client Related to the client/front-end war. project:web service Knetminer web service (ws), including search, JSON out, data export, config reading.
Projects
None yet
Development

No branches or pull requests

3 participants