-
Notifications
You must be signed in to change notification settings - Fork 107
SDRF Proteomics values, ontologies and rules
### SDRF-Proteomics values
The value for each property (e.g. characteristics, comment) corresponding to each sample can be represented in multiple ways.
- Free Text (Human readable): In the free text representation, the value is provided as text without Ontology support (e.g. colon or providing accession numbers). This is only RECOMMENDED when the text inserted in the table is the exact name of an ontology/CV term in EFO. If the term is not in EFO, other ontologies can be used.
source name | characteristics[organism] |
---|---|
sample 1 | homo sapiens |
sample 2 | homo sapiens |
Key=value representation (Human and Computer readable): The current representation aims to provide a mechanism to represent the complete information of the ontology/CV term including Accession, Name and other additional properties. In the key=value pair representation the Value of the property is represented as an Object with multiple properties, where the key is one of the properties of the object and the value is the corresponding value for the particular key. An example of of key value pairs is post-translational modifications (ptms):
NT=Glu->pyro-Glu; MT=fixed; PP=Anywhere; AC=Unimod:27; TA=E
The list of ontologies/controlled vocabularies (CV) supported are:
- PSI Mass Spectrometry CV (PSI-MS)
- Experimental Factor Ontology (EFO).
- Unimod protein modification database for mass spectrometry
- PSI-MOD CV (PSI-MOD)
- Cell line ontology
- Drosophila anatomy ontology
- Cell ontology
- Plant ontology
- Uber-anatomy ontology
- Zebrafish anatomy and development ontology
- Zebrafish developmental stages ontology
- Plant Environment Ontology
- FlyBase Developmental Ontology
- Rat Strain Ontology
- Chemical Entities of Biological Interest Ontology
- NCBI organismal classification
- PATO - the Phenotype and Trait Ontology
- PRIDE Controlled Vocabulary (CV)
There are general scenarios/use cases that are addressed by the following rules:
- Unknown values: In some cases, the column is mandatory in the format but for some samples the corresponding value is unknown. In those cases, users SHOULD use ‘not available’.
- Not Applicable values: In some cases, the column is mandatory but for some samples the corresponding value is not applicable. In those cases, users SHOULD use ‘not applicable’.
- Case sensitivity: By specification the SDRF is case insensitive, but we RECOMMEND using lowercase characters throughout all the text (Column names and values).
- Spaces: By specification the SDRF is case sensitive to spaces (sourcename != source name).
- Column order: The SDRF MUST start with the source name column (accession/name of the sample of origin), then all the sample characteristics; followed by the assay name corresponding to the MS run. Finally, after the assay name all the comments (properties of the data file generated).
- Extension: The extension of the SDRF should be .tsv or .txt.
Samples from healthy patients or individuals normally appear in manuscripts and annotations as healthy or normal. We RECOMMEND using the word “normal” mapped to term PATO_0000461 that is in EFO: normal PATO term. Example:
source name | characteristics[organism] | characteristics[organism part] | characteristics[phenotype] | characteristics[compound] | factor value[phenotype] |
---|---|---|---|---|---|
sample_treat | homo sapiens | Whole Organism | necrotic tissue | drug A | necrotic tissue |
sample_control | homo sapiens | Whole Organism | normal | none | normal |