Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SwissProt entity completion issue #96

Closed
wshayes opened this issue Jun 12, 2018 · 0 comments
Closed

SwissProt entity completion issue #96

wshayes opened this issue Jun 12, 2018 · 0 comments
Assignees
Labels

Comments

@wshayes
Copy link
Contributor

wshayes commented Jun 12, 2018

Child of #84

Reported by Natalie: For SwissProt, I don’t seem to be able to use the label (e.g., “H4_HUMAN”) to autocomplete – no results are obtained. I see the label is listed as an alt_id in the resource file, and the associated gene symbol (in the case of H4, one of many) is used as the label. Maybe this is part of the issue?

{"term": {"namespace": "SP", "namespace_value": "P62805", "src_id": "P62805", "id": "SP:P62805", "label": "HIST1H4A", "name": "HIST1H4A", "description": "Histone H4", "species_id": "TAX:9606", "species_label": "human", "entity_types": ["Gene", "RNA", "Protein"], "synonyms": ["H4/A", "H4FA"], "equivalences": ["EG:121504", "EG:554313", "EG:8294", "EG:8359", "EG:8360", "EG:8361", "EG:8362", "EG:8363", "EG:8364", "EG:8365", "EG:8366", "EG:8367", "EG:8368", "EG:8370"], "alt_ids": ["SP:A2VCL0", "SP:P02304", "SP:P02305", "SP:Q6DRA9", "SP:Q6FGB8", "SP:Q6NWP7", "SP:H4_HUMAN"]}}

Also:
IL-6 is being converted to two tokens when completing on it in Elasticsearch. Need to review the tokenization scheme for term completion.

@wshayes wshayes self-assigned this Jun 12, 2018
@ghost ghost removed the ready label Aug 24, 2018
wshayes added a commit that referenced this issue Sep 4, 2018
Fixed issue with equivalencing - both the issue of slow equivalencing due to using up to 10 steps for equivalences and the issue that alt_ids were not included in equivalencing.

Also fixed the issue with dashes and underscores being used to tokenize text for term completion - now we only use colons and spaces for tokenization.

Also shifted api routes into a separate file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant