Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Which system codes for ChEMBL? #7

Open
Tracked by #9
stain opened this issue Sep 9, 2015 · 6 comments
Open
Tracked by #9

Which system codes for ChEMBL? #7

stain opened this issue Sep 9, 2015 · 6 comments
Assignees

Comments

@stain
Copy link
Member

stain commented Sep 9, 2015

In commit ab02add1bee33b47e45bfdee7f89190681e9bcf2 @egonw added to org.bridgedb.bio/resources/org/bridgedb/bio/datasources.txt:

ChEMBL compound Cl  http://www.ebi.ac.uk/chembl/    https://www.ebi.ac.uk/chembl/compound/inspect/$id   CHEMBL308052    metabolite      1   urn:miriam:chembl.compound  ^CHEMBL\d+$ ChEMBL compound

Using system code Cl here clashes with the equivalent entry in org.bridgedb.rdf, which uses ChEMBLCompound - what was the reason for going with Cl?

See both IdentifiersOrgDataSource.ttl and in IdentifiersOrgDataSource.txt

This (luckily) causes the IdentifersOrgReaderTest test to fail with:

Caused by: java.lang.IllegalArgumentException: System code does not match for DataSource ChEMBL compound was Cl so it can not be changed to ChEMBLCompound
    at org.bridgedb.DataSource.findOrRegister(DataSource.java:640)
    at org.bridgedb.DataSource.register(DataSource.java:620)
    at org.bridgedb.rdf.BridgeDBRdfHandler.readDataSource(BridgeDBRdfHandler.java:131)
    at org.bridgedb.rdf.BridgeDBRdfHandler.getDataSource(BridgeDBRdfHandler.java:121)
    at org.bridgedb.rdf.BridgeDBRdfHandler.readAllDataSources(BridgeDBRdfHandler.java:113)
    at org.bridgedb.rdf.BridgeDBRdfHandler.doParseRdfInputStream(BridgeDBRdfHandler.java:92)
    ... 33 more

The system codes used for ChEMBL within IdentifiersOrgDataSource.txt are not ideal:

  • ChEMBLCompound
  • ChemblId
  • ChemblMolecule
  • chembl.target
  • ChemblTarget (!)
  • Chembl16TargetComponent

Those are both very long, includes (wrong) version number, and has duplicates and are inconsistent.

At Identifiers.org we find the names

(but nothing for molecules, assays or target component)

Cc is already used by CCDS.

After discussing this with @egonw I suggest modifying org/bridgedb/bio/datasources.txt to use system codes:

  • ChC (ChEMBL compound)
  • ChT (ChEMBL target)
  • ChTC (ChEMBL Target Component) -- or ChP for "protein"?

CamelCasing here mimics other entries like EnMm (Ensembl Mouse).

Views?

@Christian-B
Copy link

The rule I always applied to datasource system codes was

  1. use existing BridgeDB code if it already exists! (Even if now deprecated)
  2. use the identiers.org code if BridgeDB does not already have the DataSource
  3. Make up a new one only if neither of the above apply.
    I intentional used longer names here to not clash with possible future BridgeDB codes

So while we may not like the identiers.org codes I would still recommend using these until BridgeDB as a project selects a project wide code.

As I am no longer part of the BridgeDB project so I have no input to which new codes should be approved project wide. Except of course they should not clash with previously used (even deprecated) ones

Christian


From: Stian Soiland-Reyes [[email protected]]
Sent: Wednesday, September 09, 2015 1:09 PM
To: bridgedb/BridgeDb
Subject: [BridgeDb] Which system codes for Chembl? (#16)

In commit ab02addhttps://github.com/bridgedb/BridgeDb/commit/ab02add1bee33b47e45bfdee7f89190681e9bcf2 @egonwhttps://github.com/egonw added to org.bridgedb.bio/resources/org/bridgedb/bio/datasources.txthttps://github.com/bridgedb/BridgeDb/blob/ab02add1bee33b47e45bfdee7f89190681e9bcf2/org.bridgedb.bio/resources/org/bridgedb/bio/datasources.txt#L11:

ChEMBL compound Cl http://www.ebi.ac.uk/chembl/ https://www.ebi.ac.uk/chembl/compound/inspect/$id CHEMBL308052 metabolite 1 urn:miriam:chembl.compound ^CHEMBL\d+$ ChEMBL compound

Using system code Cl here clashes with the equivalent entry in org.bridgedb.rdf, which uses ChEMBLCompound - what was the reason for going with Cl?

See both IdentifiersOrgDataSource.ttlhttps://github.com/bridgedb/BridgeDb/blob/ab02add1bee33b47e45bfdee7f89190681e9bcf2/org.bridgedb.rdf/resources/IdentifiersOrgDataSource.ttl#L953 and in IdentifiersOrgDataSource.txthttps://github.com/bridgedb/BridgeDb/blob/ab02add1bee33b47e45bfdee7f89190681e9bcf2/org.bridgedb.rdf/resources/IdentifiersOrgDataSource.txt#L84

This (luckily) causes the IdentifersOrgReaderTest test to fail with:

Caused by: java.lang.IllegalArgumentException: System code does not match for DataSource ChEMBL compound was Cl so it can not be changed to ChEMBLCompound
at org.bridgedb.DataSource.findOrRegister(DataSource.java:640)
at org.bridgedb.DataSource.register(DataSource.java:620)
at org.bridgedb.rdf.BridgeDBRdfHandler.readDataSource(BridgeDBRdfHandler.java:131)
at org.bridgedb.rdf.BridgeDBRdfHandler.getDataSource(BridgeDBRdfHandler.java:121)
at org.bridgedb.rdf.BridgeDBRdfHandler.readAllDataSources(BridgeDBRdfHandler.java:113)
at org.bridgedb.rdf.BridgeDBRdfHandler.doParseRdfInputStream(BridgeDBRdfHandler.java:92)
... 33 more

The system codes used for ChEMBL within IdentifiersOrgDataSource.txthttps://github.com/bridgedb/BridgeDb/blob/ab02add1bee33b47e45bfdee7f89190681e9bcf2/org.bridgedb.rdf/resources/IdentifiersOrgDataSource.txt#L84 are not ideal:

  • ChEMBLCompound
  • ChemblId
  • ChemblMolecule
  • chembl.target
  • ChemblTarget (!)
  • Chembl16TargetComponent

Those are both very long, includes (wrong) version number, and has duplicates and are inconsistent.

At Identifiers.org we find the names

  • chembl.compoundhttp://identifiers.org/chembl.compound/
  • chembl.targethttp://identifiers.org/chembl.target/

(but nothing for molecules, assays or target component)

Cc is already used by CCDS.

After discussing this with @egonwhttps://github.com/egonw I suggest modifying org/bridgedb/bio/datasources.txt to use system codes:

  • ChC (ChEMBL compound)
  • ChT (ChEMBL target)
  • ChTC (ChEMBL Target Component) -- or ChP for "protein"?

CamelCasing here mimics other entries like EnMm (Ensembl Mouse).

Views?


Reply to this email directly or view it on GitHubhttps://github.com/bridgedb/BridgeDb/issues/16.

@Christian-B
Copy link

While there are definitely codes I would have created differently, I second Christian in support of using identifiers.org codes in order to avoid re-inventing the wheel. If anyone needs to create a new identifiers.org code, you can easily submit a ticket here:

http://sourceforge.net/p/identifiers-org/new-collection/new/

Anders

----- Original Message -----

From: "Christian Brenninkmeijer" [email protected]
To: "bridgedb/BridgeDb"
[email protected]
Cc: "EU openPHACTS project members based at the University of Manchester" [email protected],
"bridgedb-discuss" [email protected]
Sent: Wednesday, September 9, 2015 5:20:05 AM
Subject: [bridgedb] RE: [BridgeDb] Which system codes for Chembl? (#16)

The rule I always applied to datasource system codes was

  1. use existing BridgeDB code if it already exists! (Even if now deprecated)
  2. use the identiers.org code if BridgeDB does not already have the
    DataSource
  3. Make up a new one only if neither of the above apply.
    I intentional used longer names here to not clash with possible future
    BridgeDB codes

So while we may not like the identiers.org codes I would still recommend
using these until BridgeDB as a project selects a project wide code.

As I am no longer part of the BridgeDB project so I have no input to which
new codes should be approved project wide. Except of course they should not
clash with previously used (even deprecated) ones

Christian


From: Stian Soiland-Reyes [[email protected]]
Sent: Wednesday, September 09, 2015 1:09 PM
To: bridgedb/BridgeDb
Subject: [BridgeDb] Which system codes for Chembl? (#16)

In commit
ab02addhttps://github.com/bridgedb/BridgeDb/commit/ab02add1bee33b47e45bfdee7f89190681e9bcf2
@egonwhttps://github.com/egonw added to
org.bridgedb.bio/resources/org/bridgedb/bio/datasources.txthttps://github.com/bridgedb/BridgeDb/blob/ab02add1bee33b47e45bfdee7f89190681e9bcf2/org.bridgedb.bio/resources/org/bridgedb/bio/datasources.txt#L11:

ChEMBL compound Cl http://www.ebi.ac.uk/chembl/
https://www.ebi.ac.uk/chembl/compound/inspect/$id CHEMBL308052
metabolite 1 urn:miriam:chembl.compound ^CHEMBL\d+$ ChEMBL
compound

Using system code Cl here clashes with the equivalent entry in
org.bridgedb.rdf, which uses ChEMBLCompound - what was the reason for going
with Cl?

See both
IdentifiersOrgDataSource.ttlhttps://github.com/bridgedb/BridgeDb/blob/ab02add1bee33b47e45bfdee7f89190681e9bcf2/org.bridgedb.rdf/resources/IdentifiersOrgDataSource.ttl#L953
and in
IdentifiersOrgDataSource.txthttps://github.com/bridgedb/BridgeDb/blob/ab02add1bee33b47e45bfdee7f89190681e9bcf2/org.bridgedb.rdf/resources/IdentifiersOrgDataSource.txt#L84

This (luckily) causes the IdentifersOrgReaderTest test to fail with:

Caused by: java.lang.IllegalArgumentException: System code does not match for
DataSource ChEMBL compound was Cl so it can not be changed to ChEMBLCompound
at org.bridgedb.DataSource.findOrRegister(DataSource.java:640)
at org.bridgedb.DataSource.register(DataSource.java:620)
at
org.bridgedb.rdf.BridgeDBRdfHandler.readDataSource(BridgeDBRdfHandler.java:131)
at
org.bridgedb.rdf.BridgeDBRdfHandler.getDataSource(BridgeDBRdfHandler.java:121)
at
org.bridgedb.rdf.BridgeDBRdfHandler.readAllDataSources(BridgeDBRdfHandler.java:113)
at
org.bridgedb.rdf.BridgeDBRdfHandler.doParseRdfInputStream(BridgeDBRdfHandler.java:92)
... 33 more

The system codes used for ChEMBL within
IdentifiersOrgDataSource.txthttps://github.com/bridgedb/BridgeDb/blob/ab02add1bee33b47e45bfdee7f89190681e9bcf2/org.bridgedb.rdf/resources/IdentifiersOrgDataSource.txt#L84
are not ideal:

  • ChEMBLCompound
  • ChemblId
  • ChemblMolecule
  • chembl.target
  • ChemblTarget (!)
  • Chembl16TargetComponent

Those are both very long, includes (wrong) version number, and has duplicates
and are inconsistent.

At Identifiers.org we find the names

  • chembl.compoundhttp://identifiers.org/chembl.compound/
  • chembl.targethttp://identifiers.org/chembl.target/

(but nothing for molecules, assays or target component)

Cc is already used by CCDS.

After discussing this with @egonwhttps://github.com/egonw I suggest
modifying org/bridgedb/bio/datasources.txt to use system codes:

  • ChC (ChEMBL compound)
  • ChT (ChEMBL target)
  • ChTC (ChEMBL Target Component) -- or ChP for "protein"?

CamelCasing here mimics other entries like EnMm (Ensembl Mouse).

Views?


Reply to this email directly or view it on
GitHubhttps://github.com/bridgedb/BridgeDb/issues/16.

You received this message because you are subscribed to the Google Groups
"bridgedb-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/bridgedb-discuss.
For more options, visit https://groups.google.com/d/optout.

@AlasdairGray
Copy link
Member

I also support the use of identifiers.orghttp://identifiers.org codes here.

Alasdair

On 9 September 2015 at 19:25:25, Christian Y. Brenninkmeijer ([email protected]:[email protected]) wrote:

While there are definitely codes I would have created differently, I second Christian in support of using identifiers.org codes in order to avoid re-inventing the wheel. If anyone needs to create a new identifiers.org code, you can easily submit a ticket here:

http://sourceforge.net/p/identifiers-org/new-collection/new/

Anders

----- Original Message -----

From: "Christian Brenninkmeijer" [email protected]
To: "bridgedb/BridgeDb"
[email protected]
Cc: "EU openPHACTS project members based at the University of Manchester" [email protected],
"bridgedb-discuss" [email protected]
Sent: Wednesday, September 9, 2015 5:20:05 AM
Subject: [bridgedb] RE: [BridgeDb] Which system codes for Chembl? (#16)

The rule I always applied to datasource system codes was

  1. use existing BridgeDB code if it already exists! (Even if now deprecated)
  2. use the identiers.org code if BridgeDB does not already have the
    DataSource
  3. Make up a new one only if neither of the above apply.
    I intentional used longer names here to not clash with possible future
    BridgeDB codes

So while we may not like the identiers.org codes I would still recommend
using these until BridgeDB as a project selects a project wide code.

As I am no longer part of the BridgeDB project so I have no input to which
new codes should be approved project wide. Except of course they should not
clash with previously used (even deprecated) ones

Christian


From: Stian Soiland-Reyes [[email protected]]
Sent: Wednesday, September 09, 2015 1:09 PM
To: bridgedb/BridgeDb
Subject: [BridgeDb] Which system codes for Chembl? (#16)

In commit
ab02addhttps://github.com/bridgedb/BridgeDb/commit/ab02add1bee33b47e45bfdee7f89190681e9bcf2
@egonwhttps://github.com/egonw added to
org.bridgedb.bio/resources/org/bridgedb/bio/datasources.txthttps://github.com/bridgedb/BridgeDb/blob/ab02add1bee33b47e45bfdee7f89190681e9bcf2/org.bridgedb.bio/resources/org/bridgedb/bio/datasources.txt#L11:

ChEMBL compound Cl http://www.ebi.ac.uk/chembl/
https://www.ebi.ac.uk/chembl/compound/inspect/$id CHEMBL308052
metabolite 1 urn:miriam:chembl.compound ^CHEMBL\d+$ ChEMBL
compound

Using system code Cl here clashes with the equivalent entry in
org.bridgedb.rdf, which uses ChEMBLCompound - what was the reason for going
with Cl?

See both
IdentifiersOrgDataSource.ttlhttps://github.com/bridgedb/BridgeDb/blob/ab02add1bee33b47e45bfdee7f89190681e9bcf2/org.bridgedb.rdf/resources/IdentifiersOrgDataSource.ttl#L953
and in
IdentifiersOrgDataSource.txthttps://github.com/bridgedb/BridgeDb/blob/ab02add1bee33b47e45bfdee7f89190681e9bcf2/org.bridgedb.rdf/resources/IdentifiersOrgDataSource.txt#L84

This (luckily) causes the IdentifersOrgReaderTest test to fail with:

Caused by: java.lang.IllegalArgumentException: System code does not match for
DataSource ChEMBL compound was Cl so it can not be changed to ChEMBLCompound
at org.bridgedb.DataSource.findOrRegister(DataSource.java:640)
at org.bridgedb.DataSource.register(DataSource.java:620)
at
org.bridgedb.rdf.BridgeDBRdfHandler.readDataSource(BridgeDBRdfHandler.java:131)
at
org.bridgedb.rdf.BridgeDBRdfHandler.getDataSource(BridgeDBRdfHandler.java:121)
at
org.bridgedb.rdf.BridgeDBRdfHandler.readAllDataSources(BridgeDBRdfHandler.java:113)
at
org.bridgedb.rdf.BridgeDBRdfHandler.doParseRdfInputStream(BridgeDBRdfHandler.java:92)
... 33 more

The system codes used for ChEMBL within
IdentifiersOrgDataSource.txthttps://github.com/bridgedb/BridgeDb/blob/ab02add1bee33b47e45bfdee7f89190681e9bcf2/org.bridgedb.rdf/resources/IdentifiersOrgDataSource.txt#L84
are not ideal:

  • ChEMBLCompound
  • ChemblId
  • ChemblMolecule
  • chembl.target
  • ChemblTarget (!)
  • Chembl16TargetComponent

Those are both very long, includes (wrong) version number, and has duplicates
and are inconsistent.

At Identifiers.org we find the names

  • chembl.compoundhttp://identifiers.org/chembl.compound/
  • chembl.targethttp://identifiers.org/chembl.target/

(but nothing for molecules, assays or target component)

Cc is already used by CCDS.

After discussing this with @egonwhttps://github.com/egonw I suggest
modifying org/bridgedb/bio/datasources.txt to use system codes:

  • ChC (ChEMBL compound)
  • ChT (ChEMBL target)
  • ChTC (ChEMBL Target Component) -- or ChP for "protein"?

CamelCasing here mimics other entries like EnMm (Ensembl Mouse).

Views?


Reply to this email directly or view it on
GitHubhttps://github.com/bridgedb/BridgeDb/issues/16.

You received this message because you are subscribed to the Google Groups
"bridgedb-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/bridgedb-discuss.
For more options, visit https://groups.google.com/d/optout.


Reply to this email directly or view it on GitHubhttps://github.com/bridgedb/BridgeDb/issues/16#issuecomment-138998259.

Alasdair J G Gray
http://www.alasdairjggray.co.ukhttp://www.alasdairjggray.co.uk/
ORCID: http://orcid.org/0000-0002-5711-4872
Twitter: @gray_alasdair
Telephone: +44 131 451 3429<tel://Telephone:
+44%20131%20451%203429>
Office: EM 1.39


We invite research leaders and ambitious early career researchers to
join us in leading and driving research in key inter-disciplinary themes.
Please see www.hw.ac.uk/researchleaders for further information and how
to apply.

Heriot-Watt University is a Scottish charity
registered under charity number SC000278.

stain referenced this issue in bridgedb/BridgeDb Dec 9, 2015
and ensure main URL is correct

This is a workaround for #16
stain referenced this issue in bridgedb/BridgeDb Dec 9, 2015
also adds chembl.target

This fixes issue #16 and brings (at least these) system codes
in line with identifiers.org

It is a bit longer than "Cl" and is arguably not a "short code", but at least
now it also is a bit more recognizable.

Is this controversial/breaking change (e.g. new major version)?
stain referenced this issue in bridgedb/BridgeDb Dec 9, 2015
This fixes #16
@stain stain self-assigned this Dec 9, 2015
stain referenced this issue in bridgedb/BridgeDb Dec 9, 2015
also adds chembl.target

This fixes issue #16 and brings (at least these) system codes
in line with identifiers.org

It is a bit longer than "Cl" and is arguably not a "short code", but at least
now it also is a bit more recognizable.

Is this controversial/breaking change (e.g. new major version)?
stain referenced this issue in bridgedb/BridgeDb Dec 9, 2015
This fixes #16
@stain
Copy link
Member Author

stain commented Dec 9, 2015

My proposed pull request bridgedb/BridgeDb#20 is raised as discussion point to settle this according to what you said:

  • chembl.compound
  • chembl.target
  • chembl.targetcomponent

and adding the two first of these to datasource.txt of org.bridgedb.bio

For the http://linkedchemistry.info/ identifiers I can't find have any direct equivalent in Chembl, so I've renamed the confusing "chemblTarget" and "chemblMolecule" etc to linkedchemistry.chembl.id, linkedchemistry.chembl.target and linkedchemistry.chembl.molecule.

stain referenced this issue in bridgedb/BridgeDb Dec 9, 2015
As a side-aspect of #16 - the legacy
http://linkedchemistry.info/
identifiers don't have an equivalent on
https://www.ebi.ac.uk/chembl/
so I've renamed their system codes
to linkedchemistry.chembl.*
@stain
Copy link
Member Author

stain commented Dec 9, 2015

See also bridgedb/BridgeDb#21 - I really struggle to do any kind of change on this.

@egonw
Copy link
Member

egonw commented Dec 12, 2015

@stain let's Skype chat in the coming week?

@egonw egonw changed the title Which system codes for Chembl? Which system codes for ChEMBL? Dec 12, 2015
@egonw egonw transferred this issue from bridgedb/BridgeDb Apr 24, 2021
@DeniseSl22 DeniseSl22 mentioned this issue Apr 24, 2021
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants