SDRF Proteomics values, ontologies and rules

### SDRF-Proteomics values

The value for each property (e.g. characteristics, comment) corresponding to each sample can be represented in multiple ways.

Free Text (Human readable): In the free text representation, the value is provided as text without Ontology support (e.g. colon or providing accession numbers). This is only RECOMMENDED when the text inserted in the table is the exact name of an ontology/CV term in EFO. If the term is not in EFO, other ontologies can be used.

source name	characteristics[organism]
sample 1	homo sapiens
sample 2	homo sapiens

Key=value representation (Human and Computer readable): The current representation aims to provide a mechanism to represent the complete information of the ontology/CV term including Accession, Name and other additional properties. In the key=value pair representation the Value of the property is represented as an Object with multiple properties, where the key is one of the properties of the object and the value is the corresponding value for the particular key. An example of of key value pairs is post-translational modifications (ptms):

NT=Glu->pyro-Glu; MT=fixed; PP=Anywhere; AC=Unimod:27; TA=E

Ontologies/Controlled Vocabularies Supported

The list of ontologies/controlled vocabularies (CV) supported are:

PSI Mass Spectrometry CV (PSI-MS)
Experimental Factor Ontology (EFO).
Unimod protein modification database for mass spectrometry
PSI-MOD CV (PSI-MOD)
Cell line ontology
Drosophila anatomy ontology
Cell ontology
Plant ontology
Uber-anatomy ontology
Zebrafish anatomy and development ontology
Zebrafish developmental stages ontology
Plant Environment Ontology
FlyBase Developmental Ontology
Rat Strain Ontology
Chemical Entities of Biological Interest Ontology
NCBI organismal classification
PATO - the Phenotype and Trait Ontology
PRIDE Controlled Vocabulary (CV)

SDRF-Proteomics format rules

There are general scenarios/use cases that are addressed by the following rules:

Unknown values: In some cases, the column is mandatory in the format but for some samples the corresponding value is unknown. In those cases, users SHOULD use ‘not available’.
Not Applicable values: In some cases, the column is mandatory but for some samples the corresponding value is not applicable. In those cases, users SHOULD use ‘not applicable’.
Case sensitivity: By specification the SDRF is case insensitive, but we RECOMMEND using lowercase characters throughout all the text (Column names and values).
Spaces: By specification the SDRF is case sensitive to spaces (sourcename != source name).
Column order: The SDRF MUST start with the source name column (accession/name of the sample of origin), then all the sample characteristics; followed by the assay name corresponding to the MS run. Finally, after the assay name all the comments (properties of the data file generated).
Extension: The extension of the SDRF should be .tsv or .txt.

Normal and healthy samples

Samples from healthy patients or individuals normally appear in manuscripts and annotations as healthy or normal. We RECOMMEND using the word “normal” mapped to term PATO_0000461 that is in EFO: normal PATO term. Example:

source name	characteristics[organism]	characteristics[organism part]	characteristics[phenotype]	characteristics[compound]	factor value[phenotype]
sample_treat	homo sapiens	Whole Organism	necrotic tissue	drug A	necrotic tissue
sample_control	homo sapiens	Whole Organism	normal	none	normal

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SDRF Proteomics values, ontologies and rules

Ontologies/Controlled Vocabularies Supported

SDRF-Proteomics format rules

Normal and healthy samples

Clone this wiki locally