Skip to content

Sample and Data Relationship Format for Proteomics

Yasset Perez-Riverol edited this page Jul 4, 2021 · 24 revisions

Sample and Data Relationship Format for Proteomics

The SDRF-Proteomics file format describes the sample characteristics and the relationships between samples and data files. The file format is a tab-delimited one where each ROW corresponds to a relationship between a Sample and a Data file, each column corresponds to an attribute/property of the Sample and the value in each cell is the specific value of the property for a given Sample.

SDRF for proteomics

The SDRF-Proteomics is divided into three main blocks:

  • characteristics[...]: These are the sample properties.
  • comment[...]: These are the data properties.
  • factor value[...]: These are the variables under study.

The SDRF columns MUST starts with the source name which is the sample accession. For best practices, we recommended to use Sample-1, Sample-2, ... . After the sample accession all the columns correspond to the sample characteristics, for example (characteristics[organism]), until the assay name column which starts the Data file section.

The Data properties section (comment) starts with the assay name which is the Data file accession. After the assay name the following properties (comment) are mandatory for SDRF-Proteomics:

  • comment[label]: The label is the channel used in multiplex experiments (e.g, TMT126 - check the documentation for the labelled methods). If the sample is not label free or the experiments haven't used any multiplex analytical method, the value MUST BE label free sample.
  • comment[fraction identifier]: The fraction identifier is a unique identifier for each Data file. Fraction identifiers helps to identified any type of Fractionation method including: High-performance liquid chromatography, Isoelectric focusing or Off-gel electrophoresis.