diff --git a/docs/pages/cv/classification_reference.md b/docs/pages/cv/classification_reference.md new file mode 100644 index 0000000..c9d8c0e --- /dev/null +++ b/docs/pages/cv/classification_reference.md @@ -0,0 +1,297 @@ +--- +layout: page +title: "Metrics – Classification Reference" +permalink: /metrics/classification/ +--- + +*Standardized semantic categories for PSI-MS quality control metrics.* + +## Overview + +Each QC metric in the [PSI-MS Controlled Vocabulary (CV)](https://github.com/HUPO-PSI/psi-ms-CV) is annotated using **seven independent classification dimensions**. +First, the **analytical dimension** defines *what kind of metric it is* and is encoded as **inheritance** using `is_a`. +The other six are **typed relationships** that describe *where the metric applies, what it depends on, how to interpret it,* and *how to serialize it in mzQC*. + +**At a glance** + +| Dimension | Encoded as | Purpose | +| ------------------------------- | -------------- | ------------------------------------------------------------------- | +| **Analytical dimension** | `is_a` | Defines the metric subtype (what kind of QC metric it *is*) | +| **Workflow stage** | relationship | Where in the experimental/computational pipeline the metric applies | +| **Information dependency type** | relationship | What type of input data the metric needs | +| **Measurement scope** | relationship | At what aggregation level the metric summarizes data | +| **Acquisition strategy** | relationship | Which acquisition/mode or platform it applies to | +| **Quality interpretation type** | relationship | How to interpret higher/lower/targeted values | +| **Metric value type** | relationship | How the values are structurally represented (single, tuple, etc.) | + +**Rule of thumb:** +Every QC metric has exactly one `is_a` (analytical dimension) and one value from each of the other six relationship dimensions. + +## Part 1 — Inheritance: Analytical dimension + +**What it is:** +The analytical dimension defines the *type of QC metric*. +This is the only dimension expressed via **inheritance** (`is_a`) because it establishes the metric's place in the taxonomy. + +**How to use it:** +Choose exactly one of the following as the metric's `is_a` parent: + +#### Subclasses + +- **Acquisition coverage metric:** how comprehensively data were collected (e.g., scan counts, sampling density). +- **Mass accuracy metric:** deviation between observed and theoretical _m_/_z_. +- **Intensity stability metric:** variation of signal intensity over time. +- **Chromatographic performance metric:** separation performance (e.g., eak width, symmetry, RT reproducibility). +- **Ionization quality metric:** properties of the precursor ion population (e.g., charge-state distribution, adduct prevalence). +- **Ion mobility metric:** IMS resolution, drift-time/CCS accuracy and reproducibility. +- **Spectral quality metric:** quality of individual spectra (e.g., peak density, S/N, completeness). +- **Fragmentation efficiency metric:** effectiveness of precursor ion fragmentation to produce interpretable spectra. +- **Isolation purity metric:** precursor isolation selectivity or co-isolation of interfering species. +- **Identification confidence metric:** reliability of identifications (e.g., FDR, ID rate). +- **Quantification precision metric:** reproducibility or variability of quantitative results. +- **Contamination metric:** unwanted signal from contaminants, carryover, or background. +- **Instrument operational performance metric:** general indicators of instrument health (e.g., vacuum, detector voltage, temperature). +- **Missingness/completeness metric:** data absence or completeness across features, runs, or studies. + +#### CV example (inheritance only) + +```obo +is_a: MS:4000XXX ! chromatographic performance metric +``` + +## Part 2 — Typed relationships + +The remaining six dimensions are not types; they are **properties** of a metric. +They must be encoded using the specified **relationship** predicates (one value per dimension). + +### 1. Workflow stage — `part_of_workflow_stage` + +**Definition:** +The experimental or computational stage of the workflow to which a QC metric applies. + +This tells *where* in the process the metric is relevant — from sample preparation through acquisition and analysis. + +#### Subclasses + +**Experimental workflow stage** + +Metrics describing quality at the laboratory or instrument level. + +* **Sample preparation stage:** metrics describing sample handling, labeling, digestion, or storage quality. + *Example:* peptide recovery yield. +* **Chromatography stage:** metrics about LC separation performance. + *Example:* retention-time reproducibility, peak width. +* **Ionization stage:** metrics about ion generation and charge distribution. + *Example:* precursor charge-state fractions. +* **Ion mobility separation stage:** metrics describing the performance of gas-phase separation devices. + *Example:* ion-mobility resolution, CCS reproducibility. +* **Mass spectrometry acquisition stage:** metrics referring to scanning, detection, or data acquisition processes. + *Examples:* number of MS1 scans, duty-cycle stability. + + * **MS1 acquisition stage:** metrics that use or summarize MS1 data. + * **MS2 acquisition stage:** metrics that use or summarize MS2 data. + * **MSn acquisition stage:** metrics for higher-order fragmentation (MS³, etc.). +* **Instrument performance monitoring stage:** general metrics of instrument health and stability. + *Example:* mass-accuracy drift, spray stability. +* **Instrument calibration stage:** metrics derived from calibration routines or control samples. + +**Data analysis workflow stage** + +Metrics evaluating computational processing and interpretation steps. + +* **Data preprocessing stage:** metrics about baseline correction, noise removal, or peak picking. +* **Identification stage:** metrics assessing identification quality. + *Example:* PSM-level FDR, peptide identification rate. +* **Quantification stage:** metrics describing quantitative accuracy or precision. + *Example:* CV of peptide intensities, ratio reproducibility. +* **Integration stage:** metrics related to alignment, normalization, or data integration across runs. + +**Environmental condition monitoring** + +Metrics about environmental conditions that can indirectly affect results. +*Example:* laboratory temperature, humidity, power fluctuations. + +#### CV example + +```obo +relationship: part_of_workflow_stage MS:4000XXX ! chromatography stage +``` + +### 2. Information dependency type — `depends_on_data_type` + +**Definition:** +Specifies which type of data input the metric requires to be computed. + +#### Subclasses + +* **Raw acquisition data:** metrics that can be calculated directly from the raw MS data, without identifications. + *Example:* total ion current stability, scan count. +* **Deconvoluted data:** metrics based on processed spectra or peak lists obtained after signal deconvolution, centroiding, or deisotoping, but prior to identification. + *Example:* peak density in deconvoluted spectra, precursor mass range coverage. +* **Identification results:** metrics that depend on identified peptides, compounds, or spectra. + *Example:* PSM-level FDR, peptide coverage. +* **Quantification results:** metrics derived from quantitative data matrices. + *Example:* CV of peptide intensities. +* **Hybrid:** metrics combining multiple data types (e.g., identification and quantification). +* **Reference data:** metrics requiring comparison to external standards or reference files. + *Example:* RT deviation vs. iRT peptides, calibration QC.* + +#### CV example + +```obo +relationship: depends_on_data_type MS:4000XXX ! raw acquisition data +``` + +### 3. Measurement scope — `has_measurement_scope` + +**Definition:** +Indicates the level of data aggregation the metric summarizes. + +#### Subclasses + +* **Spectrum level:** per-spectrum metrics (e.g., number of peaks, S/N ratio). +* **Pixel/voxel level:** per-pixel metrics in imaging or spatial omics. +* **Feature level:** per feature (e.g., peptide, compound, or chromatographic peak). +* **Run level:** aggregated per LC–MS run. +* **Batch level:** aggregated across multiple related runs. +* **Study level:** aggregated across an entire experiment or project. + +#### CV example + +```obo +relationship: has_measurement_scope MS:4000XXX ! run level +``` + +### 4. Acquisition strategy — `applies_to_acquisition_mode` + +**Definition:** +Specifies which acquisition mode or instrument configuration the metric is relevant for. + +#### Subclasses + +**Acquisition mode** + +* **Acquisition mode independent:** metrics valid for any acquisition method. +* **Data-dependent acquisition (DDA):** metrics specific to stochastic precursor selection workflows. + *Example:* number of MS2 spectra per precursor. +* **Data-independent acquisition (DIA):** metrics for window-based fragmentation strategies. + *Example:* precursor window purity. +* **Targeted acquisition:** metrics for SRM, PRM, or other targeted workflows. + *Example:* transition reproducibility. +* **Ion-mobility-coupled metric:** metrics derived from acquisition methods that include gas-phase ion mobility separation. + *Example:* TIMS mobility resolution (Δ1/K₀) per run. +* **Imaging acquisition:** metrics for spatially resolved mass spectrometry experiments such as MALDI, DESI, or SIMS. + *Example:* pixel-to-pixel intensity variation across a tissue section. +* **Other specialized mode:** metrics for advanced or hybrid acquisition modes such as BoxCar, MSⁿ, or multiplexed scanning. + *Example:* BoxCar intensity uniformity across boxes. + +**Instrument platform specificity** + +* **Orbitrap-specific:** metrics only applicable to Orbitrap instruments. + *Example:* Orbitrap transient length stability. +* **TOF-specific:** metrics relevant to time-of-flight instruments. + *Example:* TOF detector voltage stability. +* **Ion-trap-specific:** metrics specific to trap-based systems. + *Example:* Ion trap fill time distribution. +* **Other platform-specific:** for quadrupoles, FT-ICR, or hybrid systems. + +#### CV example + +```obo +relationship: applies_to_acquisition_mode MS:4000XXX ! acquisition mode independent +``` + +### 5. Quality interpretation type — `has_quality_directionality` + +**Definition:** +Describes how a metric's numeric value relates to overall quality. +This enables automatic reasoning about whether "higher," "lower," or "targeted" values represent better data. + +#### Subclasses + +* **Higher is better:** increasing values indicate improved quality. + *Example:* identification rate, mass accuracy score. +* **Lower is better:** decreasing values indicate improved quality. + *Example:* FDR, mass error. +* **Context dependent:** interpretation varies depending on method or range. + *Example:* precursor charge-state fractions, peak density. +* **Target range:** optimal quality corresponds to values within a defined interval. + *Example:* temperature, pressure, retention-time drift. +* **Categorical:** quality expressed as discrete categories (e.g., pass/fail, OK/warning/error). +* **Trend:** metrics intended for temporal monitoring rather than direct ranking (e.g., instrument drift over time). + +#### CV example + +```obo +relationship: has_quality_directionality MS:4000XXX ! lower is better +``` + +### 6. Metric value type — `has_value_type` + +**Definition:** +Specifies the structural format of the metric's reported value(s). +This defines how the metric must be represented in mzQC. + +#### Subclasses + +| Type | Structure | Description | Example | +| ---------------- | ----------------- | ------------------------------------------------------------- | ----------------------------- | +| **Single value** | Scalar | A single numeric or categorical value. | Number of MS1 spectra | +| **Tuple** | Ordered list | Several ordered values of the same kind (e.g., quantiles). | XIC-FWHM quantiles | +| **Table** | Named columns | Parallel lists of equal length; each column has its own unit. | MS2 charge fractions | +| **Matrix** | Rectangular array | 2D array of homogeneous numeric values. | Ion-mobility intensity matrix | + +See the [CV Term Usage Guide](../use/) for details on how each type is encoded in mzQC. + +--- + +## Worked examples + +### XIC-FWHM quantiles (tuple) + +* **Analytical dimension (`is_a`)**: *chromatographic performance metric* +* **Workflow**: *chromatography stage* +* **Data type**: *raw acquisition data* +* **Scope**: *run level* +* **Acquisition**: *mode independent* +* **Directionality**: *lower is better* +* **Value type**: *tuple* + +```obo +is_a: MS:4000XXX ! chromatographic performance metric +relationship: part_of_workflow_stage MS:4000XXX ! chromatography stage +relationship: depends_on_data_type MS:4000XXX ! raw acquisition data +relationship: has_measurement_scope MS:4000XXX ! run level +relationship: applies_to_acquisition_mode MS:4000XXX ! acquisition mode independent +relationship: has_quality_directionality MS:4000XXX ! lower is better +relationship: has_value_type MS:4000XXX ! tuple +``` + +### MS2 known precursor charge fractions (table) + +* **Analytical dimension (`is_a`)**: *ionization quality metric* +* **Workflow**: *ionization stage* +* **Data type**: *raw acquisition data* +* **Scope**: *run level* +* **Acquisition**: *mode independent* +* **Directionality**: *context dependent* +* **Value type**: *table* + +```obo +is_a: MS:4000XXX ! ionization quality metric +relationship: part_of_workflow_stage MS:4000XXX ! ionization stage +relationship: depends_on_data_type MS:4000XXX ! raw acquisition data +relationship: has_measurement_scope MS:4000XXX ! run level +relationship: applies_to_acquisition_mode MS:4000XXX ! acquisition mode independent +relationship: has_quality_directionality MS:4000XXX ! context dependent +relationship: has_value_type MS:4000XXX ! table +``` + +## Summary + +* Use **`is_a`** for the **analytical dimension** (the metric's type). +* Use **typed relationships** (one each) for the six **orthogonal facets**: workflow stage, data dependency, scope, acquisition strategy, quality interpretation, and value type. + +These relationships together provide a complete, machine-readable semantic description of any QC metric. + +For how to serialize each **metric value type** in mzQC (single, tuple, table, matrix), see the **[CV Term Usage Guide](../use)**. diff --git a/docs/pages/cv/howto_create_cv_terms.md b/docs/pages/cv/howto_create_cv_terms.md index 1f72189..3f971f7 100644 --- a/docs/pages/cv/howto_create_cv_terms.md +++ b/docs/pages/cv/howto_create_cv_terms.md @@ -1,240 +1,211 @@ --- layout: page -title: "Metrics - create" -permalink: /metrics/create +title: "Metrics – Term Creation Guide" +permalink: /metrics/create/ --- -# CV Term Creation Guide -New CV terms have to be requested via the [mzQC GitHub issue tracker](https://github.com/HUPO-PSI/mzQC/issues). -Upon creating a new issue, you should select the "Request for new CV term" option. -This will produce a template that will guide you in providing the necessary information to request your new CV term, as detailed below. -If additional information or clarifications beyond the initial request are needed, the mzQC working group will work with you to finalize your CV term request. -When all the necessary information has been provided, a new CV term will be created based on the request and added to the QC CV. +*How to define and request new QC metrics for the PSI-MS Controlled Vocabulary.* -## Required Information +## What this guide is for -Each metric (and CV entry request) MUST include the following information +This document explains how to **request new QC metric terms** or **update existing ones** in the [PSI-MS Controlled Vocabulary](https://github.com/HUPO-PSI/psi-ms-CV). +It shows: -- Name: A (short) string describing your metric. -- Definition: A longer description. This MUST include information about how the metric should be represented in an mzQC file. -- Comment: OPTIONAL details on how the metric should be interpreted (e.g. is a higher value better, can it only be interpreted relative to...). -- Value type: Is the metric type a single value, an n-tuple, a table, or a matrix? -- Unit: OPTIONAL unit of the value, specified using an existing CV term. -- Categorization: A categorization can OPTIONALLY be supplied. Examples are whether the metric depends on spectrum, peptide, protein, or metabolite identifications; or to describe the metric context. +* what information to include, +* how to write clear definitions, and +* how to classify metrics in line with the QC metric ontology used in mzQC. -## Restrictions +These guidelines ensure that all QC metrics: -The text in `Name`, `Definition`, and `Comment` MUST NOT contain escaped characters, such as `\"`, or special characters, such as backticks (`` ` ``). -If you need to quote words or sentences, use single quotes, e.g. `def: "A QC metric describes the basis for the metric calculation like 'one MS run' or 'one spectrum'." [PSI:QC]`. -Further restrictions to some term elements may apply; please see details in the [Term Element Details](#term-element-details) section. +* are **semantically consistent** and **machine-readable**, +* fit naturally into mzQC and related PSI formats, and +* remain **traceable** to their scientific or software origin. -## Example CV term +This guide applies to QC metrics from **proteomics**, **metabolomics**, and related mass spectrometry workflows. -``` -[Term] -id: QC:4000059 -name: Number of MS1 spectra -def: "The number of MS1 events in the run." [PSI:QC] -is_a: QC:4000003 ! single value -is_a: QC:4000010 ! ID free -is_a: QC:4000023 ! MS1 metric -comment: A lower number of MS1 spectra acquired during one sample run compared to similar runs can indicate mismatched instrument settings or issues with the instrumentation or issues with sample amounts. -relationship: has_relation MS:1000579 ! MS1 spectrum -relationship: has_relation QC:4000013 ! QC metric relation: single run -property_value: has_units UO:0000189 -synonym: "MS1-Count" EXACT [] -``` +## How to request a new QC metric -## Term Element Details +All new terms are proposed through GitHub: -### ID +1. Go to the [PSI-MS-CV repository](https://github.com/HUPO-PSI/psi-ms-CV). +2. Create a new issue using the **"New QC Term"** template. +3. Fill in the required fields (see below). +4. Discuss the proposal with maintainers in the issue comments. +5. Once approved, curators assign an accession number (`MS:4000XXX`) and add the term to the next CV release. -``` -id: QC:4000059 -``` +If you're refining or updating an existing term, just open an issue referencing its ID. -Each term MUST have a unique ID, specified as `QC:XXXXXXX`. -Metric IDs are immutable and not reusable (e.g. for redefinition), and will be assigned upon inclusion or redefinition. +> [!NOTE] +> Expect some discussion — the maintainers help ensure consistency and alignment with existing terms. -### Name +## Before you start -``` -name: Number of MS1 spectra -``` +Check first: -Each CV term MUST have a human-readable name. -The name SHOULD be informative, SHOULD consist of maximum 100 characters, and SHOULD only consist of alphanumeric 7-bit ASCII characters, spaces, and punctuation marks ([\-_,\.]). +* Search the CV (e.g., via [OLS](https://www.ebi.ac.uk/ols/ontologies/ms)) to make sure that your metric doesn't already exist. +* Verify that your metric is not just a variant or combination of an existing one. +* Gather supporting documentation (publications, software references, mzQC examples). -### Definition +If you find something close but not identical, note that in your request — it helps curators decide whether to extend or merge existing terms. -``` -def: "The number of MS1 events in the run." [PSI:QC] -``` +## What information you'll need -The definition SHOULD consist of a short explanation of the term and how it should be stored in the mzQC file. -The description SHOULD also provide aid in interpreting the values. -The definition section SHOULD NOT contain calculation or interpretation details, but rather it should explain the purpose, requirements, and scope of the metric. +Each new QC metric request must contain: -### Comment +| **Element** | **What to provide** | +| --- | --- | +| **Name** | Short, descriptive title for the metric. Example: `XIC-FWHM quantiles`. | +| **Definition** | One or two sentences explaining what the metric measures, how it is summarized, and what its values mean. | +| **Comment** | *(Optional)* Additional details about computation, conventions, or interpretation. | +| **Units** | Physical or statistical unit (e.g., `UO:0000010 ! second`, `UO:0000187 ! percent`). | +| **Value type** | Structural type of the metric value: single value, tuple, table, or matrix. | +| **Semantic classification** | Seven relationships that describe what kind of metric this is (see below). | +| **Provenance** | The origin of the metric (software or publication), e.g. `xref: QuaMeter:XIC-FWHM-Q1 [PMID:24494671]`. | -``` -comment: A lower number of MS1 spectra acquired during one sample run compared -to similar runs can indicate mismatched instrument settings or issues with the -instrumentation or issues with sample amounts. -``` +> [!TIP] +> Keep names short and specific. +> Avoid tool names in the title — use `xref` for that. -The comment section SHOULD contain calculation and interpretation details, like whether smaller or larger values are desirable. -It is also RECOMMENDED to give a short explanation about how the metric works. -If the metric calculation is not obvious, the calculation is RECOMMENDED to be briefly described in common terms. -For published metrics, it is also RECOMMENDED to refer to the corresponding code. +## Example of a complete metric definition -### Value Type and Unit +Here's what a complete metric definition looks like: -``` -is_a: QC:4000003 ! single value -property_value: has_units UO:0000189 -``` +```obo +[Term] +id: MS:4000051 +name: XIC-FWHM quantiles +def: "Summarizes the distribution of chromatographic peak widths, expressed as the full width at half maximum (FWHM) of extracted ion chromatograms (XICs). Reports an ordered tuple of the first through (n-1)-th quantiles (Q1, ..., Qn-1) of the FWHM distribution within a single run. Lower values indicate narrower peaks and therefore better chromatographic performance." +comment: "Values are reported as an (n-1)-element tuple of floating-point numbers in seconds, representing the first to (n-1)-th quantiles of the FWHM distribution. The final quantile (100th percentile) is omitted because it corresponds to the maximum observed peak width, which is a boundary value that does not convey additional information about distribution shape or variability and is sensitive to outliers. The tuple length implicitly specifies how many quantiles are reported and thus the resolution of the summary." -A single value metric with a count as unit (`UO:0000189`). +! --- Ontology classification --- +is_a: MS:4000XXX ! chromatographic performance metric +relationship: part_of_workflow_stage MS:XXXXXXX ! chromatography stage +relationship: depends_on_data_type MS:XXXXXXX ! raw acquisition data +relationship: has_measurement_scope MS:XXXXXXX ! run level +relationship: applies_to_acquisition_mode MS:XXXXXXX ! acquisition mode independent +relationship: has_quality_directionality MS:XXXXXXX ! lower is better +relationship: has_value_type MS:XXXXXXX ! tuple -``` -is_a: QC:4000003 ! single value -property_value: has_units UO:0000221 -property_value: has_type STATO:0000237 +! --- Quantitative semantics --- +relationship: has_value_concept MS:1000086 ! full width at half-maximum +relationship: has_value_concept STATO:0000291 ! quantile +relationship: has_units UO:0000010 ! second +relationship: has_value_type xsd:float + +! --- Provenance --- +xref: QuaMeter:XIC-FWHM-Q1 [PMID:24494671] +xref: QuaMeter:XIC-FWHM-Q2 [PMID:24494671] +xref: QuaMeter:XIC-FWHM-Q3 [PMID:24494671] ``` -A single value metric with as unit the standard deviation (`STATO:0000237`) in Dalton (`UO:0000221`), for example, the standard deviation of the distribution of precursor mass errors of identified spectra. +## How QC metrics are classified -Each term that reports a value MUST indicate the corresponding value type using an `is_a` relation. -Different value types are possible: single value, n-tuple, table, or matrix. -A value must be associated with a unit, see below. -Depending on the value type, different additional categorization is REQUIRED. +QC metrics can be categorized according to several classification dimensions. +These specify *what kind of metric it is*, *where it belongs in the workflow*, *what data it uses*, and *how its values behave*. -- **single value:** Unit specification using `has_units` is REQUIRED, type specification using `has_type` is RECOMMENDED. -- **n-tuple:** An ordered list/array of length 'n'. Unit specification using `has_units` is REQUIRED, type specification using `has_type` is RECOMMENDED. -Units and types (optional) MUST be uniform for all values. -An n-tuple is represented by a JSON array, which implicitly defines its length 'n'. -- **table:** A table MUST have one or more columns defined using `has_column` and MAY have optional columns defined using `has_optional_column`. -A table is represented using a JSON key–value object where key(s) represent the column term names/accessions and the value(s) are JSON arrays of uniform value type and -length. -- **table column type definitions:** Unit specification using `has_units` is REQUIRED, type specification using `has_type` is RECOMMENDED. -The term name will be used as the column's header. -- **matrix:** Unit specification using `has_units` is REQUIRED, type specification using `has_type` is RECOMMENDED. -Units and types (optional) MUST be uniform for all values. -A matrix is represented by a JSON array of JSON arrays where the inner arrays MUST be of uniform length, which implicitly defines the matrix dimensions. +### Analytical dimension: inheritance via `is_a` -Units SHOULD be sourced from the [Units of Measurement Ontology (UO)](https://www.ebi.ac.uk/ols/ontologies/uo), if available, otherwise from the -[Statistical Methods Ontology (STATO)](http://stato-ontology.org/) or others as necessary. -Protein modifications SHOULD be sourced from [Unimod](http://www.unimod.org/) or [PSI-MOD](https://github.com/HUPO-PSI/psi-mod-CV) where possible. +This dimension defines **what kind of metric** you are creating — it represents the metric's *type*. -### Metric Categorization +**Syntax:** -``` -is_a: QC:4000010 ! ID free -is_a: QC:4000023 ! MS1 metric -relationship: has_relation MS:1000579 ! MS1 spectrum -relationship: has_relation QC:4000013 ! QC metric relation: single run +```obo +is_a: MS:XXXXXXX ! chromatographic performance metric ``` -Different types of categorization can be assigned to CV terms. -First, it is RECOMMENDED to specify whether a metric requires identification information to be computed (ID based) or not (ID free). -Second, additional categories to describe the metric context (from which data the metric is derived, to which element of the instrumental setup the metric pertains, etc.) can be specified as well. -It is RECOMMEND to align the categorization of novel metrics to existing terms to facilitate consumption of related metrics. +**Examples:** +`chromatographic performance metric`, `mass accuracy metric`, `spectral quality metric`, `ionization quality metric`, `quantification precision metric`, etc. -``` -property_value: has_units UO:0000010 -property_value: has_column QC:4000117 -``` +This is the **only dimension** that uses `is_a`. +It places your metric correctly within the ontology's QC metric hierarchy. -If the metric term has an associated value, its unit MUST be defined using the `property_value` tag. -"Single", "n-tuple", and "matrix" type values MUST be assigned a single, uniform unit type with `has_units`. -For "table" type values, one or more `has_column_type`/`has_optional_column_type` specifications MUST be associated with the table. -These implicitly define the column units through the `has_units` attributes of the corresponding column definitions. +### Typed relationships: contextual and structural properties -``` -property_value: has_type STATO:0000237 -``` +All other dimensions use a `relationship:` field. -For full semantic integration, it is RECOMMENDED to specify the value type for automatic processing and interpretation of the value. -It is RECOMMENDED to source value types from [STATO](http://stato-ontology.org/). +| **Dimension** | **Relationship** | **Example value** | **Meaning** | +| ------------------------------- | ----------------------------- | ------------------------------ | -------------------------------------------------------------------- | +| **Workflow stage** | `part_of_workflow_stage` | `chromatography stage` | Where in the experimental/computational workflow the metric applies. | +| **Information dependency type** | `depends_on_data_type` | `raw acquisition data` | What kind of data the metric depends on (raw, ID, quant, hybrid). | +| **Measurement scope** | `has_measurement_scope` | `run level` | Level of aggregation (spectrum, run, batch, study). | +| **Acquisition strategy** | `applies_to_acquisition_mode` | `acquisition mode independent` | Which acquisition mode/instrument the metric applies to. | +| **Quality interpretation type** | `has_quality_directionality` | `lower is better` | How values relate to data quality. | +| **Metric value type** | `has_value_type` | `tuple` | Structure of the metric value (single value, tuple, table, matrix). | -### Additional Information +Each dimension has predefined subclasses described in the [QC Metric Classification Reference](../classification/). Use that reference when selecting the appropriate classification terms for your new metric. -``` -synonym: "MS1-Count" EXACT [] -``` +## Quantitative details: what the numbers mean -In case of reimplementing, renaming, or redefining a metric, it is RECOMMENDED to also add synonym attributes with either the name or ID of the initial metric. -It is not required for the initial metric to be included in any controlled vocabulary, but the name SHOULD be unambiguous and recognizable (e.g. from the source publication). -Synonyms can be "RELATED" (the defined metric is similar, but not the same as what is connected with the synonym name), "NARROW" (the metric's values can be identically interpreted as in the meaning of the synonym metric, however, definition and calculation may somewhat differ), "EXACT" (the defined metric is basically a result of renaming). +Add relationships describing what the metric's numeric values represent: -## More CV Term Examples +| **Relationship** | **Purpose** | **Example** | +| ------------------- | ----------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------- | +| `has_value_concept` | Defines what the values represent | `MS:1000086 ! full width at half-maximum`, `STATO:0000291 ! quantile` | +| `has_units` | Defines the physical/statistical unit (preferably from [UO](https://www.ebi.ac.uk/ols/ontologies/uo)) | `UO:0000010 ! second`, `UO:0000187 ! percent` | +| `has_value_type` | Defines the data type (XML schema literal) | `xsd:float`, `xsd:int` | -**Single value:** +These fields help mzQC readers and validation tools understand how to process the data. -``` -[Term] -id: QC:4000050 -name: XIC-WideFrac -def: "The fraction of precursor ions accounting for the top half of all peak widths" [PSI:QC] -is_a: QC:4000003 ! single value -is_a: QC:4000010 ! ID free -is_a: QC:4000020 ! XIC metric -relationship: has_relation QC:4000013 ! QC metric relation: single run -property_value: has_units UO:0000191 ! fraction -``` +## Writing clear definitions and comments -**n-tuple:** +**Good definitions:** -``` -[Term] -id: QC:4000051 -name: XIC-FWHM quantiles -def: "The first to n-th quantile of peak widths for the wide XICs." [PSI:QC] -is_a: QC:4000004 ! n-tuple -is_a: QC:4000010 ! ID free -is_a: QC:4000020 ! XIC metric -relationship: has_relation MS:1000086 ! full width at half-maximum -relationship: has_relation QC:4000013 ! QC metric relation: single run -property_value: has_units UO:0000010 ! second -synonym: "XIC-FWHM-Q1" RELATED [] -synonym: "XIC-FWHM-Q2" RELATED [] -synonym: "XIC-FWHM-Q3" RELATED [] -``` +* Start with what the metric summarizes — **don't** begin with "A QC metric that..." +* Mention the data type or entity, and the summary statistic. +* End with an interpretation if relevant ("Lower values indicate better performance"). + +**Comments:** -**Table:** +Use `comment:` only to clarify: +* Implementation details (e.g., number of values, normalization). +* Context or rationale (e.g., why a value is omitted). + +Avoid repeating the definition. + +## Provenance and references + +Always record where the metric originates: + +```obo +xref: QuaMeter:XIC-FWHM-Q1 [PMID:24494671] ``` -[Term] -id: QC:4000063 -name: MS2 known precursor charges fractions -def: "The fraction of MS/MS precursors of the corresponding charge. The fractions [0,1] are given in the 'Fraction' column, corresponding charges in the 'Charge state' column. The highest charge state is to be interpreted as that charge state or higher. " [PSI:QC] -is_a: QC:4000006 ! table -is_a: QC:4000010 ! ID free -is_a: QC:4000024 ! MS2 metric -is_a: QC:4000025 ! ion source metric -relationship: has_relation MS:1000041 ! charge state -relationship: has_relation QC:4000013 ! QC metric relation: single run -property_value: has_column: QC:4000238 ! Charge state -property_value: has_column: QC:4000239 ! Fraction -synonym: "MS2-PrecZ-1" RELATED [] -synonym: "MS2-PrecZ-2" RELATED [] -synonym: "MS2-PrecZ-3" RELATED [] -synonym: "MS2-PrecZ-4" RELATED [] -synonym: "MS2-PrecZ-5" RELATED [] -synonym: "MS2-PrecZ-more" RELATED [] -[Term] -id: QC:4000238 -name: Charge state -def: "The column contains charge states." [PSI:QC] -is_a: QC:4000107 ! Column type -property_value: has_units MS:1000041 ! charge state +* Use PMIDs or DOIs when available. +* If multiple related metrics exist (e.g. Q1, Q2, Q3), include multiple `xref:` lines. -[Term] -id: QC:4000239 -name: Fraction -def: "The column contains fraction values as decimals." [PSI:QC] -is_a: QC:4000107 ! Column type -property_value: has_units UO:0000191 ! fraction +## Updating or extending metrics + +If you want to improve an existing term (e.g., clearer definition, missing relationships): + +* Open an issue referencing the metric ID. +* Explain what should change and why. +* Curators will review and update or merge as appropriate. + +Deprecated metrics are marked with: + +```obo +is_obsolete: true +replaced_by: MS:XXXXXXX ``` + +## Quick reference + +**When defining new metrics:** + +* ✅ Use `is_a` for the analytical dimension (metric type). +* ✅ Add one `relationship:` for each of the six other classification dimensions. +* ✅ Include quantitative metadata (`has_value_concept`, `has_units`, `has_value_type`). +* ✅ Add provenance (`xref:`). +* ✅ Ensure that the metric name and definition are unique. + +**Avoid:** + +* ❌ Tool names in the metric name. +* ❌ Definitions that describe algorithms instead of meaning. +* ❌ Redundant comments or duplicated phrasing. + +### See also + +* [**QC Metric Classification Reference:**](../classification/) full list of subclasses, definitions, and examples. +* [**QC Metric Usage Guide:**](../use/) how each value type (single, tuple, table, matrix) is encoded in mzQC. diff --git a/docs/pages/cv/howto_use_cv_terms.md b/docs/pages/cv/howto_use_cv_terms.md index 73b847a..25b8e4b 100644 --- a/docs/pages/cv/howto_use_cv_terms.md +++ b/docs/pages/cv/howto_use_cv_terms.md @@ -1,34 +1,94 @@ --- layout: page -title: "Metrics - use" -permalink: /metrics/use +title: "Metrics – Usage Guide" +permalink: /metrics/use/ --- -# CV Term Usage Guide -The translation from CV terms to elements in an mzQC file depends on the term's value type and is pretty straightforward. -Following, the different value types uses are exemplified . +*How to use QC CV terms correctly in mzQC files.* -## Single Value +## Introduction -To report the number of MS1 scans in a peak file: +This guide explains **how to use QC metric CV terms** from the [PSI-MS Controlled Vocabulary](https://github.com/HUPO-PSI/psi-ms-CV) when creating or reading **mzQC files**. +You don't need to be an ontology expert — just follow these examples to ensure your QC data is: +* **Standardized** (compatible across tools), +* **Machine-readable** (interpretable by validators), and +* **Traceable** (linked to known metrics in the PSI-MS CV). + +## How CV metrics map to mzQC + +Each QC metric in mzQC corresponds to one entry in the PSI-MS CV (`MS:4000XXX`). +That entry defines: + +* The **metric name** and **definition**, +* Its **units** and **value type**, +* Its **semantic classification**, describing where it applies and what it measures. + +When you reference a CV term in your mzQC file, you're telling mzQC-compatible software **exactly what kind of data this metric represents**. + +Example (simplified): + +```json +{ + "accession": "MS:4000059", + "name": "number of MS1 spectra", + "value": 8259, + "unit": { + "accession": "UO:0000189", + "name": "count unit" + } +} ``` + +## Metric value types + +Each QC metric defines **how its values are structured**, using the `has_value_type` relationship. +mzQC supports four value structures: + +| Value type | Structure | Example use | +| ---------------- | -------------------------------- | ------------------------------------- | +| **Single value** | One numeric or categorical value | number of MS1 spectra | +| **Tuple** | Ordered list of values | quantiles, summary statistics | +| **Table** | Named columns with multiple rows | precursor charge fractions | +| **Matrix** | Rectangular numerical array | image-like data, correlation matrices | + +Your mzQC file must follow the value structure declared in the CV. + +## Single-value metrics + +**Definition:** +A metric represented by one scalar value (numeric or categorical). + +**mzQC encoding:** + +* Directly report the single value in the `"value"` field. +* Include the `"unit"` object if defined in the CV. +* The data type (integer, float, string) must match the CV's declared `xsd:` type. + +**Example:** + +CV definition: + +```obo [Term] id: MS:4000059 name: number of MS1 spectra -def: "The number of MS1 events in the run." [PSI:MS] -synonym: "MS1-Count" EXACT [PMID:24494671] -is_a: MS:4000003 ! single value -relationship: has_metric_category MS:4000009 ! ID free metric -relationship: has_metric_category MS:4000012 ! single run based metric -relationship: has_metric_category MS:4000021 ! MS1 metric -relationship: has_value_type xsd:int ! The allowed value-type for this CV term +def: "Counts the number of MS1 scans within a single run." +is_a: MS:4000XXX ! acquisition coverage metric +relationship: part_of_workflow_stage MS:4000XXX ! mass spectrometry acquisition stage +relationship: depends_on_data_type MS:4000XXX ! raw acquisition data +relationship: has_measurement_scope MS:4000XXX ! run level +relationship: applies_to_acquisition_mode MS:4000XXX ! acquisition mode independent +relationship: has_quality_directionality MS:4000XXX ! higher is better +relationship: has_value_type MS:4000XXX ! single value relationship: has_units UO:0000189 ! count unit +relationship: has_value_type xsd:int +xref: QuaMeter:MS1-Count [PMID:24494671] ``` -A corresponding `qualityMetric` object in an mzQC file: +mzQC representation: -``` +```json { "accession": "MS:4000059", "name": "number of MS1 spectra", @@ -40,30 +100,46 @@ A corresponding `qualityMetric` object in an mzQC file: } ``` -## n-tuple +## Tuple metrics -To report the number of MS2 scans per quantile: +**Definition:** +A metric consisting of an ordered list of scalar values (e.g. quantiles, min/median/max triplets). +All values share the same semantic meaning and unit. -``` +**mzQC encoding:** + +* The `"value"` field is a JSON array of numbers. +* Include a single `"unit"` object applying to all elements. +* The CV term defines the interpretation (e.g., "first to (n−1)-th quantiles"). + +**Example:** + +CV definition: + +```obo [Term] id: MS:4000062 name: MS2 density quantiles -def: "The first to n-th quantile of MS2 peak density (scan peak counts). A value triplet represents the original QuaMeter metrics, the quartiles of MS2 density. The number of values in the tuple implies the quantile mode." [PSI:MS] -synonym: "MS2-Density-Q1" RELATED [PMID:24494671] -synonym: "MS2-Density-Q2" RELATED [PMID:24494671] -synonym: "MS2-Density-Q3" RELATED [PMID:24494671] -is_a: MS:4000004 ! n-tuple -relationship: has_metric_category MS:4000009 ! ID free metric -relationship: has_metric_category MS:4000012 ! single run based metric -relationship: has_metric_category MS:4000022 ! MS2 metric -relationship: has_value_type xsd:int ! The allowed value-type for this CV term -relationship: has_value_concept NCIT:C45781 ! Density +def: "Summarizes the distribution of spectral peak density in MS2 scans as quantiles of the number of fragment peaks per spectrum within a single run. The metric reports an ordered tuple of the first through (n−1)-th quantiles (Q1, ..., Qn−1), characterizing the overall fragmentation complexity and consistency across spectra." +comment: "Values are reported as an (n−1)-element tuple of counts, representing the first to (n−1)-th quantiles of the distribution of fragment peak counts per MS2 spectrum. The final quantile (100th percentile) is omitted because it corresponds to the maximum observed peak count, which is a boundary value that does not convey additional information about distribution shape or variability and is sensitive to outliers. The tuple length implicitly specifies how many quantiles are reported and thus the resolution of the summary. Lower quantiles correspond to sparsely fragmented spectra; higher quantiles indicate spectra with more peaks. Interpretation depends on the acquisition and fragmentation settings and should be treated as context dependent rather than strictly higher- or lower-is-better." +is_a: measures_property MS:4000XXX ! spectral quality metric +relationship: part_of_workflow_stage MS:4000XXX ! mass spectrometry acquisition stage +relationship: depends_on_data_type MS:4000XXX ! raw acquisition data +relationship: has_measurement_scope MS:4000XXX ! run level +relationship: applies_to_acquisition_mode MS:4000XXX ! acquisition mode independent +relationship: has_quality_directionality MS:4000XXX ! context dependent +relationship: has_value_type MS:4000XXX ! tuple +relationship: has_value_concept STATO:0000291 ! quantile relationship: has_units UO:0000189 ! count unit +relationship: has_value_type xsd:int +xref: QuaMeter:MS2-Density-Q1 [PMID:24494671] +xref: QuaMeter:MS2-Density-Q2 [PMID:24494671] +xref: QuaMeter:MS2-Density-Q3 [PMID:24494671] ``` -A corresponding `qualityMetric` object in an mzQC file: +mzQC representation: -``` +```json { "accession": "MS:4000062", "name": "MS2 density quantiles", @@ -72,60 +148,51 @@ A corresponding `qualityMetric` object in an mzQC file: "accession": "UO:0000189", "name": "count unit" } -}, +} ``` -## Table +## Table metrics -To report the MS/MS precursor charge states: +**Definition:** +A metric represented as columns of equal-length lists, each describing one variable. +Essentially a named column table with one row per observation. -``` +**mzQC encoding:** + +* `"value"` is an object where each key is a column identifier and its value is a list. +* Each column has an optional unit. +* All columns must have identical list lengths — each index corresponds to one row. +* Units are provided as an array under `"unit"` and as part of the column definition. + +**Example:** + +CV definition: + +```obo [Term] id: MS:4000063 -name: MS2 known precursor charges fractions -def: "The fraction of MS/MS precursors of the corresponding charge. The fractions [0,1] are given in the 'Fraction' column, corresponding charges in the 'Charge state' column. The highest charge state is to be interpreted as that charge state or higher." [PSI:MS] -synonym: "MS2-PrecZ-1" NARROW [PMID:24494671] -synonym: "MS2-PrecZ-2" NARROW [PMID:24494671] -synonym: "MS2-PrecZ-3" NARROW [PMID:24494671] -synonym: "MS2-PrecZ-4" NARROW [PMID:24494671] -synonym: "MS2-PrecZ-5" NARROW [PMID:24494671] -synonym: "MS2-PrecZ-more" NARROW [PMID:24494671] -synonym: "IS-3A" RELATED [PMID:19837981] -synonym: "IS-3B" RELATED [PMID:19837981] -synonym: "IS-3C" RELATED [PMID:19837981] -comment: the MS2-PrecZ metrics can be directly read from the table respective table rows, the ratios of IS-3 metrics must be derived from the respective table rows, IS-3A as ratio of +1 over +2, IS-3B as ratio of +3 over +2, IS-3C as +4 over +2. -is_a: MS:4000005 ! table -relationship: has_metric_category MS:4000009 ! ID free metric -relationship: has_metric_category MS:4000012 ! single run based metric -relationship: has_metric_category MS:4000020 ! ion source metric -relationship: has_metric_category MS:4000022 ! MS2 metric +name: MS2 known precursor charge fractions +def: "Fraction of MS/MS precursors for each charge state observed within a run. Each entry lists a precursor charge (z) and its corresponding fraction of all observed MS2 precursors." +comment: "Values are reported as a table with two columns: 'Charge state' and 'Fraction'. The final charge state bin should be interpreted as 'that charge state or higher' to include all unlisted higher charges." +is_a: MS:4000XXX ! ionization quality metric +relationship: part_of_workflow_stage MS:4000XXX ! ionization stage +relationship: depends_on_data_type MS:4000XXX ! raw acquisition data +relationship: has_measurement_scope MS:4000XXX ! run level +relationship: has_value_type MS:4000XXX ! table relationship: has_column MS:1000041 ! charge state relationship: has_column UO:0000191 ! fraction - -[Term] -id: MS:1000041 -name: charge state -def: "Number of net charges, positive or negative, on an ion." [PSI:MS] -synonym: "z" EXACT [] -is_a: MS:1000455 ! ion selection attribute -is_a: MS:1000507 ! ion property -relationship: has_value_type xsd:int ! The allowed value-type for this CV term - -[RDF extract] -id: UO:0000191 -name: fraction -def: "A dimensionless ratio unit which relates the part (the numerator) to the whole (the denominator). [Wikipedia:Wikipedia]" +relationship: has_value_type xsd:float ``` -A corresponding `qualityMetric` object in an mzQC file: +mzQC representation: -``` +```json { "accession": "MS:4000063", - "name": "MS2 known precursor charges fractions", + "name": "MS2 known precursor charge fractions", "value": { - "Charge state": ["1","2","3","4","5","6"], - "Fraction": [0.000,0.683,0.305,0.008,0.002,0.002] + "MS:1000041": [1, 2, 3, 4, 5, 6], + "UO:0000191": [0.000, 0.683, 0.305, 0.008, 0.002, 0.002] }, "unit": [ { @@ -139,4 +206,35 @@ A corresponding `qualityMetric` object in an mzQC file: ] } ``` -The units of a table instance are implicitly assumed through their respective columns' definition and if available as unit terms, documented in the unit array of the instance for clarity. \ No newline at end of file + +## Matrix metrics + +**Definition:** +A metric that stores a rectangular grid of numeric values of the same type and unit. + +**mzQC encoding:** + +* `"value"` is a rectangular list of lists of numbers (each inner list = a matrix row). +* A single `"unit"` applies to all entries. +* Only homogeneous numeric types are allowed (no mixed datatypes). + +## Understanding hierarchy and relationships + +Each QC metric term in the CV encodes its semantics in two ways: + +* The `is_a` hierarchy specifies *what kind of metric* it is (the analytical dimension). +* The typed `relationship`s describe *where and how* it applies. + +| Ontology element | Describes | Example | +| ----------------------------- | ------------------------------------- | ------------------------------------ | +| `is_a` | Type of metric (analytical dimension) | `chromatographic performance metric` | +| `part_of_workflow_stage` | Experimental or computational stage | `chromatography stage` | +| `depends_on_data_type` | Type of data used | `raw acquisition data` | +| `has_measurement_scope` | Level of aggregation | `run level` | +| `applies_to_acquisition_mode` | Acquisition mode | `DIA-specific metric` | +| `has_quality_directionality` | Interpretation of values | `lower is better` | +| `has_value_type` | Structure of the value | `tuple` | + +These relationships make each metric comparable, searchable, and logically complete while maintaining a clean metric taxonomy. + +For full details and all available subclasses (e.g., analytical metric types, workflow stages, or acquisition modes), see the [QC Metric Classification Reference](../classification). diff --git a/docs/pages/metrics.md b/docs/pages/metrics.md index d43b60b..17a298d 100644 --- a/docs/pages/metrics.md +++ b/docs/pages/metrics.md @@ -1,12 +1,33 @@ --- layout: page -title: Metrics +title: QC Metrics permalink: /metrics/ --- -The mzQC format owes much of it's _simplicity_ **and** _flexibility_ to the use of controlled vocabulary (CV) terms to define and instantiate quality metric records. -You can find out more on how to use and define your own CV terms below. +The mzQC format achieves both _simplicity_ and _flexibility_ by using **Controlled Vocabulary (CV) terms** to describe quality metrics in a precise and machine-readable way. +These terms are defined within the [**PSI-MS Controlled Vocabulary (CV)**](https://github.com/HUPO-PSI/psi-ms-CV) and specify: -{% include_relative cv/howto_use_cv_terms.md %} +* what each metric measures, +* how it is computed and represented, and +* how it relates to specific workflow stages or data types. -{% include_relative cv/howto_create_cv_terms.md %} +This ensures that QC results are interoperable across software tools, consistent across datasets, and unambiguously interpretable by both humans and machines. + +## Learn more about QC metric CV terms + +Whether you're using, creating, or browsing metrics, the following pages explain everything you need to know: + +### [Metric Classification Reference](classification/) + +A taxonomy of QC metric categories and relationships. +Defines the seven classification dimensions used in mzQC and how they describe each metric's meaning, context, and structure. + +### [Using QC Metrics](use/) + +A hands-on guide for developers and tool integrators. +Learn how to reference, interpret, and serialize CV terms in mzQC files — including examples for single-value, tuple, table, and matrix metrics. + +### [Creating New QC Metrics](create/) + +Step-by-step instructions for proposing or updating QC metric terms in the PSI-MS CV. +Explains how to write clear definitions, select correct classifications, and provide provenance and quantitative details.