Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regarding provenance and getProperties() #15

Open
lucas-ubm opened this issue Feb 2, 2021 · 5 comments
Open

Regarding provenance and getProperties() #15

lucas-ubm opened this issue Feb 2, 2021 · 5 comments

Comments

@lucas-ubm
Copy link

When using the getPropeties() method to obtain provenance information about a given mapper object only provenance information on one data source is provided. For example, in the case of a 'Homo sapiens' mapper I obtain provenance information on the 'Ensembl' data source.
However, when using BridgeDb's webservices we obtain a list containing the provenance information of all data sources for a given organism.

I wonder if this is the intended behavior or the R package should display the information for all data sources.

Code to replicate:

location <- getDatabase('Homo sapiens')
mapper <- loadDatabase(location)
getProperties(mapper)
@Chris-Evelo
Copy link

I am confused. I would expect that for the Homo Sapiens gene product database there only is one source for the mapping, and that that is indeed ENSEMBL (although provenance should contain data and version). What other data sources does the webservice show then?

@lucas-ubm
Copy link
Author

It returns a list with information regarding all data sources for the given organism (since the input for the call is only the organism). In the case of Homo sapiens:

DATASOURCENAME Ensembl
BUILDDATE 20180509
SERIES Homo sapiens genes and proteins
DATATYPE GeneProduct
DATASOURCEVERSION 91
SCHEMAVERSION 3
DATASOURCENAME HMDB-CHEBI-WIKIDATA
BUILDDATE 20201104
DATATYPE Metabolite
SERIES standard_metabolite
DATASOURCEVERSION HMDB4.0.20190116-CHEBI193-WIKIDATA20201104
SCHEMAVERSION 3
DATASOURCENAME EBI-RHEA
BUILDDATE 20190522
SERIES standard-interaction
DATATYPE Interaction
DATASOURCEVERSION 1.0.0
SCHEMAVERSION 3
DATASOURCENAME Wikidata
BUILDDATE 20200527
SERIES humancorona
DATATYPE GeneProduct
DATASOURCEVERSION 1.0.0
SCHEMAVERSION 3
DATASOURCENAME Wikidata
BUILDDATE 20200510
SERIES complexes
DATATYPE Complex
DATASOURCEVERSION 1.0.0
SCHEMAVERSION 3
DATASOURCENAME Wikidata
BUILDDATE 20200510
SERIES publications
DATATYPE Article
DATASOURCEVERSION 1.0.0
SCHEMAVERSION 3

@Chris-Evelo
Copy link

That is actually a bit weird, right. It seems to return information from other loaded databases, at least for reactions and metabolites. I think:

  1. You should be able to query what databases are loaded (Can you?)
  2. For each of these ask for the provenance.
  3. When a new database is loaded it should somehow get from the database what relevant provenance it has and make that available for 2 (I could imagine that future databases will have different types of prpovenence)

@lucas-ubm
Copy link
Author

This is what the webservice returns if you ask for the properties of homo sapiens (so calling https://webservice.bridgedb.org/Human/properties) (the closest thing we currently have to provenance)

@Chris-Evelo
Copy link

Yes, that is what I meant. It seems to also return information about the metabolite database and the reaction database when you ask about the human geneproduct database. That must be confusing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants