diff --git a/data-message/TODO.md b/data-message/TODO.md deleted file mode 100644 index c05c55f..0000000 --- a/data-message/TODO.md +++ /dev/null @@ -1,6 +0,0 @@ -Open Issues - -- What to do with annotations -- how to deal with language requests that cannot be fulifilled for individual items -- decision about the link to the dsd -- How to deal with multiple measures diff --git a/data-message/docs/sdmx-csv-field-guide.md b/data-message/docs/sdmx-csv-field-guide.md index ca080ef..007e8f9 100644 --- a/data-message/docs/sdmx-csv-field-guide.md +++ b/data-message/docs/sdmx-csv-field-guide.md @@ -4,99 +4,273 @@ SDMX-CSV Data Message is an SDMX data exchange format based on the [RFC 4180](ht SDMX-CSV integrates with other specifications, i.e.: - The SDMX API RESTful specification (e.g. content negotiation with mime-type to get SDMX-CSV representations, specific formats for responses, language selection through HTTP content negotiation) -- The [RFC 4180](https://tools.ietf.org/html/rfc4180) specification (determined column number, "comma" separated) +- The [RFC 4180](https://tools.ietf.org/html/rfc4180) specification -SDMX-CSV is flexible enough in its representation to support the needs of different target audiences: -- A representation optimised for public data dissemination and similar, and for usage in common statistical software -- A representation optimised for creating pivot tables in spreadsheets applications - -## RFC 4180: A common format for CSV files +## RFC 4180: A common format for CSV files In order to benefit from best practices, SDMX-CSV is based on the rules defined in the [RFC 4180](https://tools.ietf.org/html/rfc4180), which defines a common format and MIME Type for CSV files. It is advised to read the (very short) RFC for a full list of requirements but, in a nutshell, the RFC defines rules such as: -- How the CSV file should be structured (the RFC specifies that all records must have an identical structure, like when using an SDMX "flat" representation for data); +- How the CSV file should be structured (the RFC specifies that all records must have an identical structure (determined column number), like when using an SDMX "flat" representation for data); - When double-quotes should be used and how to escape them when needed; -- How spaces should be handled; -- Which separator should be used (comma or locale specific); +- How spaces should be handled: Spaces are considered part of a field and should not be ignored; - Which mime type should be used; - What is the default character set, etc. -However, in order to assure the possibility to always clearly identify the data contained in the message, the SDMX specification excludes switching off column headers. - -# Design principles - -- There is no additional SDMX-specific header. The SDMX-CSV format is designed for the purpose of general public dissemination of statistical data. -- After the mandatory header row, each row contains the information related to one specific observation. -- Columns: There must be one column for the dataflow, one column per dimension, one column for the measure and one column per attribute. All dimensions defined in the related Data Structure Definition (DSD) are to be included. In case the SDMX RESTful 2.1 web service implementation supports a streaming mechanism, columns for all attributes defined in the DSD are present in the output, regardless of whether these attributes are used. Implementers have the possibility to add any other custom columns as required, e.g. serieskey, sender, prepared, etc. -- Column headers (first row): - - For the first column, the dataflow column, always is the term *DATAFLOW*. - - For a dimension column, is the dimension's ID or both ID and localised name (see option below). - - For the measure column, always is the term *OBS_VALUE*. - - For an attribute column, is the attribute's ID or both ID and localised name (see option below). - - For any custom column, is any custom but unique term. -- Column content (all rows after header): - - For the dataflow column, is the reference to the *dataflow* in the following form: AGENCY:DATAFLOW_ID(VERSION), or both reference and localised name (see option below). - - For a dimension column, is the ID or both ID and localised name (see option below) of the observation's code in the corresponding dimension. - - For the measure column, is the value of the observation. - - For a coded attribute column, is the ID or both ID and localised name (see option below) of the code in the corresponding attribute. For attributes defined at series, group or dataset level, the codes are replicated for all observations concerned. - - For an uncoded attribute column, is the value of the corresponding attribute. For attributes defined at series, group or dataset level, the values are replicated for all observations concerned. - - For any custom column, contains any potentially localised custom content. -- Comma (,) separator for columns is used by default, but it is recommended for implementers to provide the response according to the locale of the client as indicated in the http Accept-Language header (which means that in some cases the semi-colon ‘;’ is acceptable as separator). See the second example below. The separator used in a message can be determined by the ninth character of the message, which is just after the fixed first column header term *DATAFLOW*. +The SDMX-CSV format is flexible enough in its representation to support the needs of different target audiences: +- It is designed and optimised for the purpose of general public data dissemination of statistical data, and for usage in common statistical software. +- It allows using the messages to create pivot tables in spreadsheets applications. + +# Design principles for SDMX-CSV 2.0 Data Messages (aligned with SDMX 3.0.0) + +- In order to ensure the identifiability of the data contained in the message, the header row containing the column headers is mandatory and its content is well-defined. +- After the mandatory header row, each row contains the information related to one specific observation or to one or more attributes attached to partial keys. For `Delete` actions a row can also concern several observations if dimensions are wildcarded. +- In [RFC 4180](https://tools.ietf.org/html/rfc4180), csv stands for "comma-separated values". However, while SDMX-CSV uses indeed the "comma" (%x2C) as the default field separator, it adopts the wider interpretation of csv as "character-separated values". It is recommended for implementers to provide SDMX-CSV messages according to the locale of the user (e.g. as indicated in the http Accept-Language header). It means that e.g. the semi-colon ‘;’ (as used typically in specific regions or countries) is acceptable as separator. See also the related example below. Note that the separator used in a message can be determined by retrieving the character that follows the fixed first column header term *STRUCTURE* (which may be extended by a squared bracket term). + +## Columns + +- The first column is always used for the structure type: dataflow, data structure definition or data provision agreement. +- The next one or two columns are always used for the structure's identification. +- The next column is always used for the action to be performed. +- The next up to two columns are used for the series and/or observation key. +- Each Data Structure Definition (DSD) component (dimensions, measures, attributes including those defined through a referenced Metadata Structure Definition) included in the message is represented in one or two columns. SDMX web services should return the columns in the order of components as defined in (each of) the underlying Data Structure Definition(s), grouped by type of component, thus in case of data defined by different data structures: first the dimensions of the first data structure, then the remaining dimensions of the second data structure and so forth, then the measures of the first data structure, then the remaining measures of the second data structure and so forth, then the attributes of the first data structure, then the remaining attributes of the second data structure and so forth. However, any order of these columns is valid for data uploads to SDMX-consuming systems. +- Only all those dimension columns have to be present, that are required to uniquely identify the concerned attributes and/or measures. +- Attributes can but do not need to be included even if they have a mandatory status. +- Measures can but do not have to be included. +- When an SDMX RESTful web service implements streaming, then it might not know, while generating the csv header row, which measures and attributes actually have values. Therefore, it can happen that all values presented in an attribute or measure column are left empty. +- Implementers have the possibility to add any other custom columns as required, e.g. updated, prepared, etc. + +## Column headers (first row) + +- The header field of the first column always contains the term `STRUCTURE`. + - This field must be extended with a sub-field delimiter encapsulated in squared brackets "[]", e.g. `STRUCTURE[;]`, in case the message contains multi-valued or multi-language measure or attribute values. +- The header field of the second column always contains the term `STRUCTURE_ID`. +- If option `labels=name` (see *[here](#optional-parameters)*): An additional column is added right after the artefact identification column containing the term `STRUCTURE_NAME`. +- The header field of the next column should contain the term `ACTION`. For convenience, if this column is not present, a default action ("Information") is assumed for the whole message. +- The next up to two columns contain, if option `key=series|obs|both` (see *[here](#optional-parameters)*), in this order the terms `SERIES_KEY` and/or `OBS_KEY`. +- The other columns for components contain: + - Default: The ID of the component reported in that column, e.g. `DIM1`. + - If option `labels=both` (see *[here](#optional-parameters)*): The ID and the localised name of the component reported in that column separated by the term ": ", e.g. `DIM1: Dimension 1`. + - If option `labels=name` (see *[here](#optional-parameters)*): An additional column is added right after the component identification column containing the localised name of the component reported in the previous column. +- Any other custom column contains a custom but unique term, e.g. `UPDATED`. + +## Column content (all rows after header) + +- The first column contains: `dataflow`, `datastructure` or `dataprovision`, depending on type of artefact for which the data contained in the row are defined: dataflow, data structure definition or data provision agreement. +- The second column contains: + - Default: The artefact identification information for the data in the row in the form *AGENCY:ARTEFACT_ID(VERSION)*(1), e.g. `ESTAT:NA_MAIN(1.6.0)`. + - If option `labels=both` (see *[here](#optional-parameters)*): The artefact identification information and its localised name separated by the term ": ", e.g. `ESTAT:NA_MAIN(1.6.0): National Accounts Main Aggregates`. +- If option `labels=name` (see *[here](#optional-parameters)*): An additional column is added right after the artefact identification column with the artefact's localised name, e.g. `National Accounts Main Aggregates`. +- The next column contains one character representing one of the current 4 action types: + - "I": Information - Data is for information purposes. If such data messages are loaded into an SDMX database, the action "A" (Append) is assumed. + - "A": Append - Data is for an incremental update of existing observations or partial-key attributes or for the provision of new data formerly absent. This means that only the information provided explicitly in the message should be altered. Any measure or attribute value that is to be added or changed must be provided. However, the absence of an observation value or a data attribute at any level does not imply deletion; instead it is simply implied that the value is to remain unchanged. Therefore, it is valid and acceptable to send a data message with an action of Append which (in addition to identifying structure columns) contains only identifying dimensions with some attribute values. In this case, whatever the attachment level of the attributes is, the values for the attributes will be updated. Note that it is not permissible to update measure or attribute values using incomplete identification information, e.g. without the structure ID or without the necessary dimensions (full key for measures, full key/partial key/none for attributes). + - "R": Replace - Data is for replacement. Existing observations are to be fully replaced. Existing attribute values are to be replaced. Observations or attribute values formally absent will be appended. + - "D": Delete - Data is to be deleted. 'Delete' is assumed to be an incremental deletion. The deletion is to take place at the lowest level of detail provided in the row. Concretely, if a 'Delete' row only contains the identification information of the structural artefact (dataflow, data structure definition or metadata provision agreement) without any dimension, measure and attribute values then all data for the given artefact will be deleted. If the row contains only the structure identification and partial dimension values then all observations and all attribute values relating to those dimension values will be deleted. If the row contains only the structure identification, partial dimension values as well as values for some of the related attributes then only these attribute values will be deleted. If the row contains only the structure identification and full dimension values then the related observation and all its observation-level attribute values will be deleted. Finally, if the row contains only the structure identification, full dimension values as well as values for some of the related measures and attributes then only these measure and observation-level attribute values will be deleted. To be deleted measure and attribute values must be non-empty, e.g. marked with the dash character "-". + - For convenience, if this column is absent then the action "Information" is assumed. +- The next up to two columns contain, if option `key=series|obs|both`, in this order the series keys and/or the observation keys (see *[here](#optional-parameters)*). +- The other columns for components contain: + - Default: The ID(s) (if coded) or value(s) (if non-coded) for the component values reported in that column for the corresponding observation, e.g. `A`. + - If option `labels=both` (see *[here](#optional-parameters)*): The ID(s) and the localised name separated by the term ": " (if coded) or the value(s) (if non-coded) for the component values reported in that column for the corresponding observation, e.g. `A: A value name`. + - If option `labels=name` (see *[here](#optional-parameters)*): An additional column is added right after the component identification column containing the localised name, e.g. `A value name`, of the component value reported in the previous column. It is empty if the value has no localised name. + - For rows containing the information related to one specific observation, the related values for attributes attached to partial keys may have to be replicated. + - For rows containing the information related to one or more attributes attached to partial keys, in addition to these attributes only the components that are part of the partial key need to be filled, all other components can be left empty. Also the columns not related to the attribute's data structure (when data from different data structures are present) are to be left empty. + - For rows containing information to be deleted, the deletion is assumed to take place at the lowest level of detail provided in the message. For that purpose, to be deleted measure or attribute values are non-empty, e.g. marked with the dash character "-". Delete operations allow wildcarding dimensions by leaving the corresponding dimension field empty. +- The other custom columns contain any potentially localised custom content. + +## Localisation + - HTTP content negotiation, see [RFC 2616 - HTTP 1.1 Header Field Definitions](https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html) - - Always use this mime-type in the Accept header: + - Always use this mime-type in the Accept header: `application/vnd.sdmx.data+csv; version=2.0.0`. + - The client can indicate preferred languages through the Accept-Language header, e.g. `fr, en-gb;q=0.8, en;q=0.7`. +- Always localise all artefact names according to the preferred language. The first best language match according to the user’s preferred language choices in the http Accept-Language header (or if that is not available than according to the system's default language order) is to be used for each localisable name element. The message does however not indicate the returned language per localisable name element. In case that there is no such language match for a particular localisable name element, it is optional to return the element in a system-default language or alternatively to not return the element. +**It is recommended to indicate all languages used anywhere in the message for localised name elements through http Content-Language response header (languages of the intended audience).** +Note: For multi-language values, all language versions are provided independently from the preferred language (see below). + +## Multi-valued components and nested metadata attributes + +- Some components (measures or attributes) allow for multiple values. Those multiple values are separated by a special sub-field separation character, e.g. `;`. +- This sub-field separation character has to be defined as first character in the squared bracket term of the header field of the first column, e.g. `STRUCTURE[;]`. +- Such components are indicated by having their IDs followed by empty squared brackets "[]", e.g. `ATTR4[]`. +- For coded multi-valued components, if option `labels=both` (see *[here](#optional-parameters)*) then each individual value is to be prefixed with its ID and the term ": ", e.g. `A: Value A;B: Value B`. +- Each metadata attribute is also to be presented in its own column(s), even if the metadata attributes are nested. In that case, the attribute IDs in the column headers are pre-fixed with the IDs of the related parent attribute(s) separated by a dot `.`, e.g. `CONTACT[].NAME[]`. All the values corresponding to one attribute are presented like a multi-valued component by respecting their position in the nested attribute tree, e.g. `name for contact 1;name for contact 2`. Parent branches in that attribute tree without a value for a specific attribute need to be indicated by leaving the corresponding multi-value sub-field empty, e.g. `name for contact 1;;name for contact 3`. Nested parent branches are indicated by using nested double quotes. Note that fields containing double quotes must themselves be encapsulated in double quotes and that nested inner double quotes need to be doubled recoursively, e.g. `"""name 1 for contact 1;name 2 for contact 1"";""name 1 for contact 2;name 2 for contact 2"""`. - `application/vnd.sdmx.data+csv; version=1.0.0` - - - The client can indicate preferred languages through the Accept-Language header, e.g.: +## Non-coded multi-lingual components - `fr, en-gb;q=0.8, en;q=0.7` +- Some non-coded components (measures or attributes) allow for multi-lingual values. Those values are separated by a special sub-field separation character, e.g. `;`. +- This sub-field separation character has to be defined as first character in the squared bracket term of the header field of the first column, e.g. `STRUCTURE[;]`. +- Such components are indicated by having their IDs followed by the list of possible 2-letter ISO language codes separated by the sub-field separator and encapsulated squared brackets "[]", e.g. `ATTR2[en;fr]`. +- Each individual language value is to be prefixed with its 2-letter ISO language code and a colon character ":", e.g. `en:Value;fr:Valeur`. Thus, in distinction to the ID prefix for coded values when using the HTTP accept header `labels=both` (see *[here](#optional-parameters)*), the language prefix `xx:` doesn't have an extra space character. +- Note that multi-lingual components are always non-coded and therefore do not interfere with value IDs. -# Localised names +## Non-coded multi-lingual multi-valued components -The first best language match according to the user’s preferred language choices in the http Accept-Language header (or if that is not available than according to the system's default language order) is to be used for each localisable name element. The message does however not indicate the returned language per localisable name element. In case that there is no such language match for a particular localisable name element, it is optional to return the element in a system-default language or alternatively to not return the element. -**It is recommended to indicate all languages used anywhere in the message for localised name elements through http Content-Language response header (languages of the intended audience).** +- Some non-coded components (measures or attributes) allow for multiple multi-lingual values. All individual values are separated by a special sub-field separation character, e.g. `;`. +- This sub-field separation character has to be defined as first character in the squared bracket term of the header field of the first column, e.g. `STRUCTURE[;]`. +- Such components are indicated by having its ID followed by the list of possible language codes separated by the sub-field separator and encapsulated squared brackets "[]", e.g. `ATTR2[en;fr;de]`. +- Each individual language value is to prefixed with its 2-letter ISO language code and a colon character ":", e.g. `en:Value1`. +- Each multi-lingual value set is to be encapsulated in double quotes, e.g. `"en:Value1;fr:Valeur1";"en:Value2;de:Wert2"`. However, note that fields containing double quotes must themselves be encapsulated in double quotes and that the inner double quotes need to be doubled, thus the complete example is `"""en:Value1;fr:Valeur1"";""en:Value2;de:Wert2"""`. + +## Non-coded XHTML-valued components + +- Some non-coded components (measures or attributes) allow for XHTML values. +- Each XHTML value is to be encapsulated in double quotes, e.g. `"

This is some ""metadata html""

"`. Remember that the inner quotes need to be doubled. +- The CSV format allows fields to contain line breaks if those fields are enclosed in double quotes. Thus XHTML values can also contain line breaks. # Optional parameters Optional parameters can be added to the HTTP Accept header. They need to be separated by the character combination `"; "`. -- labels (id|both; default=id): This parameter applies to all Nameable SDMX Artefacts contained in the header and the body of the message: - - If the parameter value is `id` then only the id of the Artefacts is displayed. - - If the parameter value is `both` then the concatenated id and localised name of the Artefacts (see the section on [localised names](#localised-names) on how the message deals with languages) separated by `": "` are displayed. Note that the character combination `": "` could also be part of the Artefact name and could therefore occur several times within the concatenated string. +- labels (id|name|both; default=id): This parameter applies to all Nameable SDMX Artefacts contained in the header and the body of the message: + - If the parameter value is `id` then only the id/value of the artefacts is displayed. + - If the parameter value is `name` then the id/value and the name of the artefacts are displayed in separate columns (see *[here](#columns)*), the ID/value column always directly preceding its related localised name column. + - If the parameter value is `both` then the concatenated id/value and localised name of the artefacts (see the section on [localised names](#localised-names) on how the message deals with languages) separated by `": "` are displayed. Note that the character combination `": "` could also be part of the artefact name and could therefore occur several times within the concatenated string. - timeFormat (original|normalized; default=original): - - If the parameter value is `original` then the *TIME-PERIOD* values are displayed in the SDMX *TIME_PERIOD* format as originally recorded. - - If the parameter value is `normalized` then the *TIME_PERIOD* values are converted to the most granular [ISO 8601](https://www.iso.org/iso-8601-date-and-time-format.html) representation taking into account the highest frequency of the data in the message and the moment in time when the lower-frequency values were collected (which, e.g. at the ECB, is typically either at the beginning, middle or end of the reporting period). This eases comparisons and business analysis of multi-frequency values, e.g. in pivot tables. As an example, if annual and daily data are available in the message and the annual data were collected at the end of the reporting period, the formatted value for the annual period 2014 becomes 2014-12-31. - -Support of above non-default parameters is not required by implementers. + - If the parameter value is `original` then the time dimension (*TIME-PERIOD*) values are displayed in the SDMX *TIME_PERIOD* format as originally recorded. + - If the parameter value is `normalized` then the time dimension (*TIME_PERIOD*) values are converted to the most granular [ISO 8601](https://www.iso.org/iso-8601-date-and-time-format.html) representation taking into account the highest frequency of the data in the message and the moment in time when the lower-frequency values were collected (which, e.g. at the ECB, is typically either at the beginning, middle or end of the reporting period). This eases comparisons and business analysis of multi-frequency values, e.g. in pivot tables. As an example, if annual and daily data are available in the message and the annual data were collected at the end of the reporting period, the formatted value for the annual period 2014 becomes 2014-12-31. +- keys (none|obs|series|both; default=none): Request the addition of column(s) for keys. + - If the value is `none` (the default), no related column will be added. + - If the value is `obs`, a new column OBS_KEY will be added after the ACTION column. The column will contain the combination of IDs/values for all the dimensions, order by their order in the data structure definition and separated by a dot character (.), e.g. M.USD.EUR.SP00.2020-01 + - If the value is `series`, a new column SERIES_KEY will be added after the ACTION column. The column will contain the combination of IDs/values for all the dimensions except the one(s) attached to the observation, ordered by their order in the data structure definition and separated by a dot character (.), e.g. M.USD.EUR.SP00 + - If the value is `both`, both a SERIES_KEY and an OBS_KEY columns must be added after the ACTION column, starting with the SERIES_KEY column. # Examples -#### 1) HTTP Accept header: application/vnd.sdmx.data+csv; version=1.0.0 +Note: All examples assume the minimal HTTP Accept header: `application/vnd.sdmx.data+csv; version=1.0.0` + +#### 1) Ordinary case + + STRUCTURE,STRUCTURE_ID,ACTION,DIM_1,DIM_2,DIM_3,OBS_VALUE,ATTR_2,ATTR_3,ATTR_1,UPDATED + dataflow,ESTAT:NA_MAIN(1.6.0),I,A,B,2014-01,12.4,Y,"Normal, special and other values",N,2021-01-22T13:15:41Z + dataflow,ESTAT:NA_MAIN(1.6.0),I,A,B,2014-02,10.8,Y,"Normal, special and other values",Y,2021-01-22T13:15:41Z + +Notes: +- The following default parameter settings are automatically applied: + - labels=id + - timeFormat=original +- *UPDATED* is a custom column + +#### 2) Components in any order, missing component(s), component with multiple values + + STRUCTURE[;],STRUCTURE_ID,ACTION,OBS_VALUE1,OBS_VALUE2,ATTR_3,ATTR_1[],DIM_2,DIM_1,DIM_3 + dataflow,ESTAT:NA_MAIN(1.6.0),I,12.4,12.5,"Normal, special and other values",X;Y,B,A,2014-01 + dataflow,ESTAT:NA_MAIN(1.6.0),I,10.8,10.9,"Normal, special and other values",X;Z,B,A,2014-02 + +#### 3) Components in any order and missing component, HTTP Accept header: `application/vnd.sdmx.data+csv; version=1.0.0; key=series` + + STRUCTURE[;],STRUCTURE_ID,ACTION,SERIES_KEY,OBS_VALUE1,OBS_VALUE2,ATTR_3,ATTR_1,DIM_2,DIM_1,DIM_3 + dataflow,ESTAT:NA_MAIN(1.6.0),I,A.B,12.4,12.5,"Normal, special and other values",N,B,A,2014-01 + dataflow,ESTAT:NA_MAIN(1.6.0),I,A.B,10.8,10.9,"Normal, special and other values",Y,B,A,2014-02 + +#### 4) Localisation: HTTP Accept header: `application/vnd.sdmx.data+csv; version=1.0.0; labels=both; key=both`, HTTP Accept-Language header: `fr-FR, en;q=0.7` + + STRUCTURE[|];STRUCTURE_ID;ACTION;SERIES_KEY;OBS_KEY;DIM_1: Dimension 1;DIM_2: Dimension 2;DIM_3: Dimension 3;OBS_VALUE: Observation value;ATTR_2: Attribut 2;ATTR_3: Attribut 3;ATTR_1: Attribut 1 + dataflow;ESTAT:NA_MAIN(1.6.0): Principaux agrégats des comptes nationaux;I;A.B;A.B.2014-01;A: Value A;B: Value B;2014-01: 2014-01;12,4;Y: Oui;Normal, special and other values;N: Non + dataflow;ESTAT:NA_MAIN(1.6.0): Principaux agrégats des comptes nationaux;I;A.B;A.B.2014-02;A: Value A;B: Value B;2014-02: 2014-02;10,8;Y: Oui;Normal, special and other values;Y: Oui + +Note that in this example the client prefers French (fr) language with the France (FR) locale, but will also accept any type of English. Therefore, in the message the French language with the France locale is applied, transforming also the field separator from comma (,) to semicolon (;), and the decimal separator from dot (.) to comma (,). + +#### 5) HTTP Accept header: `application/vnd.sdmx.data+csv; version=1.0.0; labels=both; timeFormat=normalized` + + STRUCTURE[;],STRUCTURE_ID,ACTION,DIM_1: Dimension 1,DIM_2: Dimension 2,DIM_3: Dimension 3,OBS_VALUE: Observation value,ATTR_2: Attribute 2,ATTR_3: Attribute 3,ATTR_1: Attribute 1 + dataflow,ESTAT:NA_MAIN(1.6.0): National Accounts Main Aggregates,I,A: Value A,B: Value B,2014-01-01,12.4,Y: Yes,"Normal, special and other values",N: No + dataflow,ESTAT:NA_MAIN(1.6.0): National Accounts Main Aggregates,I,A: Value A,B: Value B,2014-02-01,10.8,Y: Yes,"Normal, special and other values",Y: Yes + +#### 6) HTTP Accept header: `application/vnd.sdmx.data+csv; version=1.0.0; labels=name` + + STRUCTURE,STRUCTURE_ID,STRUCTURE_NAME,ACTION,DIM_1,Dimension 1,DIM_2,Dimension 2,DIM_3,Dimension 3,OBS_VALUE,Observation value,ATTR_1,Attribute 1,ATTR_2,Attribute 2,ATTR_3,Attribute 3 + dataflow,ESTAT:NA_MAIN(1.6.0),National Accounts Main Aggregates,I,A,Value A,B,Value B,2014-01,2014-01,12.4,,Y,Yes,"Normal, special and other values",,N,No + dataflow,ESTAT:NA_MAIN(1.6.0),National Accounts Main Aggregates,I,A,Value A,B,Value B,2014-02,2014-02,10.8,,Y,Yes,"Normal, special and other values",,Y,Yes + +#### 7) Multi-valued components + + STRUCTURE[;],STRUCTURE_ID,ACTION,DIM_1,DIM_2,DIM_3,OBS_VALUE,ATTR_1[],ATTR_2[],ATTR_3[] + dataflow,ESTAT:NA_MAIN(1.6.0),I,A,B,2014-01,12.4,Value X;Value Y,"M, N & O;P & Q",A;B;C + dataflow,ESTAT:NA_MAIN(1.6.0),I,A,B,2014-02,10.8,Value X;Value Y,"M, N & O;P & Q",A;C + +#### 8) Non-coded multi-lingual components, varying dataflows based on the same underlying data structure + + STRUCTURE[;],STRUCTURE_ID,ACTION,DIM_1,DIM_2,DIM_3,OBS_VALUE,ATTR_1[en;fr] + dataflow,ESTAT:NA_MAIN(1.6.0),I,A,B,2014-01,12.4,en:Any Value;fr:N'importe quelle Valeur + dataflow,ESTAT:NA_MAIN(1.7.0),I,A,B,2014-02,10.8,"en:Value ""X"";fr:Valeur ""X""" + +#### 9-A) Varying structural artefacts based on same underlying data structure + + STRUCTURE[;],STRUCTURE_ID,ACTION,DIM_1,DIM_2,DIM_3,OBS_VALUE,ATTR_1[en;fr] + dataflow,ESTAT:DF_NA_MAIN(1.6.0),I,A,B,2014-01,12.4,en:Any Value;fr:N'importe quelle Valeur + datastructure,ESTAT:DSD_NA_MAIN(1.7.0),I,A,B,2014-02,10.8,"en:Value ""X"";fr:Valeur ""X""" + dataprovision,ESTAT:DPA_NA_MAIN(1.8.0),I,A,B,2014-03,11.2,"en:Value ""Y"";fr:Valeur ""Y""" + +#### 9-B) Varying structural artefacts based on different underlying data structures + + STRUCTURE[;],STRUCTURE_ID,ACTION,DIM_A1B1,DIM_A2,DIM_A3C2,DIM_B2,DIM_C1,DIM_C3,MEAS_A1B1C1,MEAS_C2,ATTR_A1,ATTR_B1 + dataflow,ESTAT:DF_A(1.6.0),I,DIMVAL_A1B1,DIMVAL_A2,DIMVAL_A3C2,,,,"MEASVAL_A1B1C1",,"ATTRVAL_A1", + datastructure,ESTAT:DSD_B(1.7.0),I,DIMVAL_A1B1,,,DIMVAL_B2,,,"MEASVAL_A1B1C1",,,"ATTRVAL_B1" + dataprovision,ESTAT:DPA_C(1.8.0),I,,,DIMVAL_A3C2,,DIMVAL_C1,DIMVAL_C3,"MEAS_A1B1C1","MEAS_C2",, + +#### 10) Varying actions + + STRUCTURE,STRUCTURE_ID,ACTION,DIM_1,DIM_2,DIM_3,OBS_VALUE,ATTR_1 + dataflow,ESTAT:NA_MAIN(1.6.0),A,A,B,2014-01,12.4,X + dataflow,ESTAT:NA_MAIN(1.6.0),R,A,B,2014-02,10.8,Y + +#### 11) Data for a non-versioned(1) data structure definition + + STRUCTURE,STRUCTURE_ID,ACTION,DIM_1,DIM_2,DIM_3,OBS_VALUE,ATTR_1 + datastructure,AGENCY:DF_ID,I,A,B,2014-01,12.4,N + datastructure,AGENCY:DF_ID,I,A,B,2014-02,10.8,Y + +#### 12) Attributes attached to partial keys for a data provision agreement + + STRUCTURE,STRUCTURE_ID,ACTION,DIM_2,DIM_3,ATTR_1 + dataprovision,AGENCY:DPA_ID(1.0.0),I,B,2014-01,N + dataprovision,AGENCY:DPA_ID(1.0.0),I,B,2014-02,Y + +#### 13) Mixing rows for attributes attached to partial keys with rows for observations + + STRUCTURE,STRUCTURE_ID,ACTION,DIM_1,DIM_2,DIM_3,MEAS_1,ATTR_1,ATTR_2 + dataflow,AGENCY:DF_ID(1.0.0),I,A,B,2014-01,12.4,N, + dataflow,AGENCY:DF_ID(1.0.0),I,,B,,,,Y + +#### 14) Nested metadata attributes attached to partial keys + + STRUCTURE,STRUCTURE_ID,ACTION,DIM_2,COLLECTION.METHOD[en;fr],CONTACT[],CONTACT[].NAME[] + dataflow,AGENCY:DF_ID(1.0.0),I,A,en:AAA;fr:BBB,Contact 1;Contact 2,"""Contact 1 Name 1;Contact 1 Name 2"";""Contact 1 Name 1;Contact 2 Name 2""" + dataflow,AGENCY:DF_ID(1.0.0),I,B,en:CCC;fr:DDD,Contact 1;Contact 2;Contact 3,"""Contact 1 Name 1;Contact 1 Name 2"";;""Contact 3 Name 1;Contact 3 Name 2""" + +#### 15) Non-coded XHTML-formatted values with line-breaks + + STRUCTURE,STRUCTURE_ID,ACTION,DIM_1,DIM_2,DIM_3,OBS_VALUE,ATTR_1 + dataflow,ESTAT:NA_MAIN(1.6.0),I,A,B,2014-01,12.4,"

This is some ""xhtml"" with a line + break

" + dataflow,ESTAT:NA_MAIN(1.6.0),I,A,B,2014-02,10.8,"

This is some other ""xhtml""

" + +#### 16) Deleting specific measure and attribute values: all non-empty values (e.g. marked with "-") are deleted + + STRUCTURE,STRUCTURE_ID,ACTION,DIM_1,DIM_2,DIM_3,OBS_VALUE,ATTR_2,ATTR_3,ATTR_1 + dataflow,ESTAT:NA_MAIN(1.6.0),D,A,B,2014-01,-,,, + dataflow,ESTAT:NA_MAIN(1.6.0),D,A,B,2014-02,,,-, - DATAFLOW,DIM_1,DIM_2,DIM_3,OBS_VALUE,ATTR_2,ATTR_3,ATTR_1,SERIESKEY - ESTAT:NA_MAIN(1.6),A,B,2014-01,12.4,Y,"Normal, special and other values",N,A.B - ESTAT:NA_MAIN(1.6),A,B,2014-02,10.8,Y,"Normal, special and other values",Y,A.B +#### 17) Deleting specific measure and attribute values with wildcarded dimensions: all non-empty values (e.g. marked with "-") are deleted for all dimension combinations where: + - row 2: DIM2=A + - row 3: DIM2=B -The following default parameter settings are automatically applied: -- labels=id -- timeFormat=original -- *SERIESKEY* is a custom column. + STRUCTURE,STRUCTURE_ID,ACTION,DIM_1,DIM_2,DIM_3,OBS_VALUE,ATTR_2,ATTR_3,ATTR_1 + dataflow,ESTAT:NA_MAIN(1.6.0),D,,A,,-,,, + dataflow,ESTAT:NA_MAIN(1.6.0),D,,B,,,,-, -#### 2) HTTP Accept header: application/vnd.sdmx.data+csv; version=1.0.0; labels=both -#### HTTP Accept-Language header: fr-FR, en;q=0.7 +#### 18) Deleting whole observations with wildcarded dimensions: all observations are deleted for all dimension combinations where: + - row 2: DIM2=A + - row 3: DIM2=B and DIM3=C - DATAFLOW;DIM_1: Dimension 1;DIM_2: Dimension 2;DIM_3: Dimension 3;OBS_VALUE;ATTR_2: Attribut 2;ATTR_3: Attribut 3;ATTR_1: Attribut 1;SERIESKEY - ESTAT:NA_MAIN(1.6): Principaux agrégats des comptes nationaux;A: Value A;B: Value B;2014-01;12,4;Y: Oui;Normal, special and other values;N: Non;A.B - ESTAT:NA_MAIN(1.6): Principaux agrégats des comptes nationaux;A: Value A;B: Value B;2014-02;10,8;Y: Oui;Normal, special and other values;Y: Oui;A.B + STRUCTURE,STRUCTURE_ID,ACTION,DIM_2,DIM_3 + dataflow,ESTAT:NA_MAIN(1.6.0),D,A,, + dataflow,ESTAT:NA_MAIN(1.6.0),D,B,C, -The following default parameter settings are automatically applied: -- timeFormat=original -- *SERIESKEY* is a custom column. +#### 19) Deleting all data for a data structure definition: -Note that in this example the client prefers French (fr) language with the France (FR) locale, but will also accept any type of English. Therefore, in the message the French language with the France locale is realized, transforming also the field separator from comma (,) to semicolon (;), and the decimal separator from dot (.) to comma (,). + STRUCTURE,STRUCTURE_ID,ACTION + datastructure,ESTAT:DSD_NA_MAIN(1.6.0),D +or -#### 3) HTTP Accept header: application/vnd.sdmx.data+csv; version=1.0.0; labels=both; timeFormat=normalized + STRUCTURE,STRUCTURE_ID,ACTION,DIM_1,DIM_2,DIM_3 + datastructure,ESTAT:DSD_NA_MAIN(1.6.0),D,,, - DATAFLOW,DIM_1: Dimension 1,DIM_2: Dimension 2,DIM_3: Dimension 3,OBS_VALUE,ATTR_2: Attribute 2,ATTR_3: Attribute 3,ATTR_1: Attribute 1,SERIESKEY - ESTAT:NA_MAIN(1.6): National Accounts Main Aggregates,A: Value A,B: Value B,2014-01-01,12.4,Y: Yes,"Normal, special and other values",N: No,A.B - ESTAT:NA_MAIN(1.6): National Accounts Main Aggregates,A: Value A,B: Value B,2014-02-01,10.8,Y: Yes,"Normal, special and other values",Y: Yes,A.B +------------------------ -The following default parameter settings are automatically applied: -- *SERIESKEY* is a custom column. +**(1)** Note that since SDMX 3.0.0 the syntax *AGENCY:ARTEFACT_ID(VERSION)* allows omitting the version for non-versioned artefacts. In this case using *AGENCY:ARTEFACT_ID* is sufficient, e.g. `AGENCY:DF_ID` diff --git a/metadata-message/docs/sdmx-csv-field-guide.md b/metadata-message/docs/sdmx-csv-field-guide.md new file mode 100644 index 0000000..54eb45e --- /dev/null +++ b/metadata-message/docs/sdmx-csv-field-guide.md @@ -0,0 +1,192 @@ +# Introduction + +SDMX-CSV Data Message is an SDMX data exchange format based on the [RFC 4180](https://tools.ietf.org/html/rfc4180). CSV is a widely used standardised and simple format to exchange data supported by many tools. + +SDMX-CSV integrates with other specifications, i.e.: +- The SDMX API RESTful specification (e.g. content negotiation with mime-type to get SDMX-CSV representations, specific formats for responses, language selection through HTTP content negotiation) +- The [RFC 4180](https://tools.ietf.org/html/rfc4180) specification + +## RFC 4180: A common format for CSV files + +In order to benefit from best practices, SDMX-CSV is based on the rules defined in the [RFC 4180](https://tools.ietf.org/html/rfc4180), which defines a common format and MIME Type for CSV files. It is advised to read the (very short) RFC for a full list of requirements but, in a nutshell, the RFC defines rules such as: +- How the CSV file should be structured (the RFC specifies that all records must have an identical structure (determined column number), like when using an SDMX "flat" representation for data); +- When double-quotes should be used and how to escape them when needed; +- How spaces should be handled: Spaces are considered part of a field and should not be ignored; +- Which mime type should be used; +- What is the default character set, etc. + +# Design principles for SDMX-CSV 2.0 Metadata Messages (aligned with SDMX 3.0.0) + +- In order to ensure the identifiability of the metadata contained in the message, the header row containing the column headers is mandatory and its content is well-defined. +- An SDMX-CSV referential metadata message contains metadata attribute values for one or more metadatasets reported for one or more metadataflows or metadata provision agreements. +- After the mandatory header row, each row contains the information related to one specific metadataset attached to one or more identifiable artefacts (targets). +- In [RFC 4180](https://tools.ietf.org/html/rfc4180), csv stands for "comma-separated values". However, while SDMX-CSV uses indeed the "comma" (%x2C) as the default field separator, it adopts the wider interpretation of csv as "character-separated values". It is recommended for implementers to provide SDMX-CSV messages according to the locale of the user (e.g. as indicated in the http Accept-Language header). It means that e.g. the semi-colon ‘;’ (as used typically in specific regions or countries) is acceptable as separator. See also the examples below. Note that the separator used in a message can be determined by retrieving the character that follows the header field of the first column which extended by a squared bracket term (see below). + +## Columns + +- The first column is always used for the underlying type of structure by which the metadataset is defined: metadataflow or metadata provision agreement. +- The next one or two columns are always used for the related structure identification. +- The next one or two columns are used for the metadataset identification. +- The next column is used for the action to be performed for the metadataset. +- The next column is used for the structure types of all targets of the metadataset. +- The next one or two columns are used for the identification of all targets of the metadataset. +- Each metadata attribute of the included metadataset(s) is represented in one or two columns. SDMX web services should return the columns in the metadata attribute order as defined in (each of) the underlying Metadata Structure Definition(s), thus in case of data defined by different metadata structures: first the metadata attributes of the first metadata structure, then the remaining metadata attributes of the second metadata structure and so forth. However, any order of these columns is valid for metadata uploads to SDMX-consuming systems. +- Implementers have the possibility to add any other custom columns as required, e.g. publicationPeriod, publicationYear, reportingBegin, reportingEnd, prepared, etc. +- In the context of appending or deleting metadata, certain columns may be omitted, see below. + +## Column headers (first row) + +- The header field of the first column always starts with the term `MDSTRUCTURE`. + - This field must be extended with a sub-field delimiter encapsulated in squared brackets "[]", e.g. `MDSTRUCTURE[;]`, in case the message contains metadatasets with multiple targets or with multi-instance or multi-language metadata attributes. +- The header field of the second column always contains the term `MDSTRUCTURE_ID`. +- If option `labels=name` (see *[here](#optional-parameters)*): An additional column is added right after the structure identification column containing the term `MDSTRUCTURE_NAME`. +- The header field of the next column always contains the term `METADATASET_ID`. +- If option `labels=name` (see *[here](#optional-parameters)*): An additional column is added right after the metadataset identification column containing the term `METADATASET_NAME`. +- The header field of the next column should contain the term `ACTION`. For convenience, if this column is not present, a default action ("Information") is assumed for the whole message. +- The header field of the next column contains the term `TARGET_TYPES`. +- The header field of the next column contains the term `TARGET_IDS`. +- If option `labels=name` (see *[here](#optional-parameters)*): An additional column is added right after the target identification column containing the term `TARGET_NAMES`. +- The other columns for components contain: + - Default: The ID of the metadata attribute reported in that column prefixed by all corresponding nested parent metadata attributes separated by a dot "." in the form *METADATA_ID[.METADATA_ID]+*, e.g. `ATTRIBUTE_GRANDPARENT_ID.ATTRIBUTE_PARENT_ID.ATTRIBUTE_CHILD_ID`. Additional pairs of squared brackets `[]` are added at the end of the IDs of those metadata attributes that have multiple instances, e.g. `CONTACT[].NAME`, `CONTACT[].PHONE[]` or `CONTACT.PHONE[]`, and/or that contain localised values. In the latter case the brackets encapsulate the ISO 2-letter language codes that can be encountered in that column, separated by the special sub-field separation character, e.g. `;`, defined in the squared bracket term of the header field of the first column, e.g. `MDSTRUCTURE[;]`. Example of a localised child attribute: `PROCESS.STEP[en;fr]`, and for multiple instances: `PROCESS.STEP[][en;fr]`. + - If option `labels=both` (see *[here](#optional-parameters)*): The full ID (as described above under 'Default') and the localised name of the metadata attribute reported in that column separated by the term ": ", e.g. `ATTRIBUTE_ID: ATTRIBUTE_NAME. + - If option `labels=name` (see *[here](#optional-parameters)*): An additional column is added right after the metadata attribute identification column containing the localised name of the metadata attribute reported in the previous column. +- Any other custom column contains a custom but unique term, e.g. `publicationPeriod`. + +## Column content (all rows after header) + +- The first column contains: `metadataflow` or `metadataprovision`, depending on type of artefact for which the metadata contained in the message are defined: metadataflow or metadata provision agreement. +- The second column contains: + - Default: The structure identification information in the form *AGENCY:ARTEFACT_ID(VERSION)* (1), e.g. `ESTAT:MDF(1.6.0)`. + - If option `labels=both` (see *[here](#optional-parameters)*): The structure identification information and its localised name separated by the term ": ", e.g. `ESTAT:MDF(1.6.0): Metadataflow name`. +- If option `labels=name` (see *[here](#optional-parameters)*): An additional column is added right after the structure identification column with the structure's localised name, e.g. `Metadataflow name`. +- The next column contains the metadataset identification information in the form *AGENCY:ARTEFACT_ID(VERSION)*(1), e.g. `AGENCY:MD_SET(1.0.0)`. + - If option `labels=both` (see *[here](#optional-parameters)*): The ID and the localised name of the metadataset separated by the term ": ", e.g. `ESTAT:MD_SET(1.0.0): Metadataset 1`. +- If option `labels=name` (see *[here](#optional-parameters)*): An additional column is added right after the metadataset identification column with the metadataset's localised name, e.g. `Metadata set name`. +- The next column contains one character representing one of the current 4 action types: + - "I": Information - Metadata is for information purposes. If such metadata messages are loaded into an SDMX database, the action "A" (Append) is assumed. + - "A": Append - Metadata is intended for an incremental update of existing metadatasets or the provision of new metadatasets formerly absent. This means that only the information provided explicitly in the message should be altered. Any metadata attribute value that is to be added or changed must be provided. However, the absence of a metadata attribute value at any metadata nesting level does not imply deletion; instead it is simply implied that the value is to remain unchanged. Therefore, it is valid and acceptable to send a metadata message with an action of 'Append' which (in addition to the required identification columns) contains only the column and value of a parent metadata attribute. In this case, only that value will be updated. The values of the child metadata attributes are not changed or deleted. Note that it is not permissible to update metadata attributes using incomplete identification information, e.g. without the metadataset or without the target identifier. In order to update a metadata attribute, the full identification information (all identification columns listed here) must always be provided. According to the SDMX 3.0 semantic versioning rules, it is not possible to update a semantically versioned metadataset. + - "R": Replace - Existing metadatasets are to be fully replaced. Metadatasets formally absent will be added. According to the SDMX 3.0 semantic versioning rules, it is not possible to replace a semantically versioned metadataset. + - "D": Delete - Metadata is to be deleted. 'Delete' is assumed to be an incremental deletion. The deletion is to take place at the lowest level of detail provided in the row. Concretely, if a 'Delete' row only contains the identification information of the structural artefact (metadataflow or metadata provision agreement) without a metadataset then all metadatasets for the given artefact will be deleted. If a 'Delete' row only contains up to the metadataset identification without metadata attributes then the given metadataset is to be deleted. Finally, if a row contains all complete identification information up to a non-versioned metadataset and some values for metadata attributes, then only the metadata attributes with values will be deleted from the non-versioned metadataset. To be deleted attribute values must be non-empty, e.g. marked with the dash character "-". According to the SDMX 3.0 semantic versioning rules, it is not possible to modify a semantically versioned metadataset. + - For convenience, if this column is absent then the action "Information" is assumed. +- The next column contains the types of all the targets of the metadataset according to the resource names defined for Structural Metadata Queries, e.g. `dataflow`. Multiple targets are separated by the special sub-field separation character, e.g. `;`, defined in the squared bracket term of the header field of the first column, e.g. `MDSTRUCTURE[;]`. Example for multiple target types: `dataflow;dataflow`. +- The next column contains the identification information of all the targets of the metadataset in the form *AGENCY:ARTEFACT_ID(VERSION)* (1), separated by the sub-field separation character, e.g. `AGENCY:DF1(1.0.0);AGENCY:DF2(1.0.0)`. + - If option `labels=both` (see *[here](#optional-parameters)*): The column contains the ID and the localised name of the targets separated by the term ": ", e.g. `AGENCY:DF(1.0.0): Dataflow name` or `AGENCY:DF1(1.0.0): Dataflow 1 name;AGENCY:DF2(1.0.0): Dataflow 2 name`. +- If option `labels=name` (see *[here](#optional-parameters)*): An additional column is added right after the target identification column with the target's localised name, e.g. `Dataflow name` or `Dataflow 1 name;Dataflow 2 name`. +- The other columns for metadata attributes contain: + - Default: The ID(s) (if coded) or value(s) (if non-coded) for the metadata attribute reported in that column, e.g. `A`, `A;B` or `"

An XHTML text

"`. + - If option `labels=both` (see *[here](#optional-parameters)*): The ID(s) and their localised name(s) for the metadata attribute separated by the term ": " (if coded) or the value(s) (if non-coded) for the metadata attribute reported in that column, e.g. `A: A value name`, `A: A value name;B: B value name` or `"

An XHTML text

"`. + - If option `labels=name` (see *[here](#optional-parameters)*): An additional column is added right after the metadata attribute identification column containing the localised name, e.g. `A value name` or `A value name;B value name`, of the metadata attribute value reported in the previous column. It is empty if the value has no localised name. + - All string/textual values (complete string between column-separating characters including ID's or language codes) should always be encapsulated in quotation marks, they must be if they contain commas or inner quotation marks. Quotation marks in strings/textual values must always be escaped by doubling the quotes. + - When metadata from different metadata structures are present then the columns not related to the attribute's metadata structure are to be left empty. +- The other custom columns contain any potentially localised custom content. + +## Localisation + +- HTTP content negotiation, see [RFC 2616 - HTTP 1.1 Header Field Definitions](https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html) + - Always use this mime-type in the Accept header: `application/vnd.sdmx.metadata+csv; version=2.0.0`. + - The client can indicate preferred languages through the Accept-Language header, e.g. `fr, en-gb;q=0.8, en;q=0.7`. +- Always localise all artefact names according to the preferred language. The first best language match according to the user’s preferred language choices in the http Accept-Language header (or if that is not available than according to the system's default language order) is to be used for each localisable name element. The message does however not indicate the returned language per localisable name element. In case that there is no such language match for a particular localisable name element, it is optional to return the element in a system-default language or alternatively to not return the name element. +**It is recommended to indicate all languages used anywhere in the message for localised name elements through http Content-Language response header (languages of the intended audience).** +Note: For multi-language metadata attribute values, all language versions are provided independently from the preferred language (see below). + +## Multi-instance metadata attributes + +- Values from multiple instances of a metadata attribute within a metadataset are separated by the special sub-field separation character, e.g. `;`, defined in the squared bracket term of the header field of the first column, e.g. `MDSTRUCTURE[;]`. +- Such metadata attributes are indicated in the column header by having their ID followed by empty squared brackets "[]", e.g. `ATTR[]`. +- For coded multi-instance metadata attributes, if option `labels=both` (see *[here](#optional-parameters)*) then each individual value is to be prefixed with its ID and the term ": ", e.g. `A: Value A;B: Value B`. + +## Non-coded multi-lingual metadata attributes + +- Non-coded metadata attributes allow for multi-lingual values. Those values are separated by the special sub-field separation character, e.g. `;`, defined in the squared bracket term of the header field of the first column, e.g. `MDSTRUCTURE[;]`. +- Such metadata attributes are indicated in the column header by having their ID followed by the list of possible 2-letter ISO language codes separated by the sub-field separator and encapsulated squared brackets "[]", e.g. `ATTR[en;fr]`. +- Each individual language value is to be prefixed with its 2-letter ISO language code and a colon character ":", e.g. `en:Value;fr:Valeur`. Thus, in distinction to the ID prefix for coded values when using the HTTP accept header `labels=both` (see *[here](#optional-parameters)*), the language prefix `xx:` doesn't have an extra space character. + +## Non-coded multi-lingual multi-instance metadata attributes + +- When non-coded multi-lingual metadata attributes have multiple instances within a metadataset, then all individual values are included and separated by the special sub-field separation character, e.g. `;`, defined in the squared bracket term of the header field of the first column, e.g. `MDSTRUCTURE[;]`. +- Such metadata attributes are indicated in the column header by having their ID followed by squared brackets "[]" as well as by the list of possible language codes separated by the sub-field separator and encapsulated in additional squared brackets "[]", e.g. `ATTR[][en;fr;de]`. +- Each individual language value is to prefixed with its 2-letter ISO language code and a colon character ":", e.g. `en:Value1`. +- Not each value needs all language versions. In order to allow knowing to which value the different language items belong, each multi-lingual value set is to be encapsulated in double quotes, e.g. `"en:Value1;fr:Valeur1";"en:Value2;de:Wert2"`. However, note that fields with double quotes must themselves be encapsulated in double quotes and that the inner double quotes need to be doubled, thus the fully complete example is `"""en:Value1;fr:Valeur1"";""en:Value2;de:Wert2"""`. + +## Non-coded XHTML-valued components + +- Some non-coded metadata attributes allow for XHTML values. +- Each XHTML value is to be encapsulated in double quotes, e.g. `"

This is some ""metadata html""

"`. Remember that the inner double quotes need to be doubled. +- The CSV format allows fields to contain line break characters if those fields are enclosed in double quotes. Thus XHTML values can also contain line breaks, although HTML viewers will ignore them. + +# Optional parameters + +The following optional parameter can be added to the HTTP Accept header. It needs to be separated by the character combination `"; "`. +- labels (id|name|both; default=id): This parameter applies to all Nameable SDMX Artefacts contained in the header and the body of the message: + - If the parameter value is `id` then only the id of the Artefacts is displayed. + - If the parameter value is `both` then the concatenated id and localised name of the Artefacts (see the section on [localised names](#localised-names) on how the message deals with languages) separated by `": "` are displayed. Note that the character combination `": "` could also be part of the Artefact name and could therefore occur several times within the concatenated string. + - If the parameter value is `name` then the id/value and the name of the artefacts are displayed in separate columns (see *[here](#columns)*), the ID/value column always directly preceding its related localised name column. + +# Examples + +Note: All examples assume the minimal HTTP Accept header: `application/vnd.sdmx.metadata+csv; version=1.0.0` + +#### 1) Ordinary case + + MDSTRUCTURE,MDSTRUCTURE_ID,METADATASET_ID,ACTION,TARGET_TYPES,TARGET_IDS,ATTRIBUTE_1,ATTRIBUTE_1.CHILD,ATTRIBUTE_2 + metadataflow,OECD:MDF(1.0.0),OECD:MDS(1.0.0),I,dataflow,OECD:DF(1.0.0),A STRING VALUE,"

An XHTML text with ""quotes""

",123 + +Note: +The following default parameter settings are automatically applied: +- labels=id + +#### 2) Metadata attribute with multiple instances and multi-lingual values + + MDSTRUCTURE[;],MDSTRUCTURE_ID,METADATASET_ID,ACTION,TARGET_TYPES,TARGET_IDS,ATTRIBUTE_1,ATTRIBUTE_1.ATTRIBUTE_1_2[][en;fr],ATTRIBUTE_2[],ATTRIBUTE_3[] + metadataflow,OECD:MDF(1.0.0),OECD:MDS(1.0.0),I,dataflow,OECD:DF(1.0.0),CODE_ID,"""en:""""

An XHTML text

"""";fr:""""

Un texte XHTML

""""";""en:""""

Another XHTML text

"""";fr:""""

Un autre texte XHTML

""""""","""Text with """"quotes"""""";""Another text""",123;456 + +#### 3) Localisation: HTTP Accept header: `application/vnd.sdmx.metadata+csv; version=1.0.0; labels=both`, HTTP Accept-Language header: `fr-FR, en;q=0.7`, metadata attribute with multiple instances, metadata attributes with multi-lingual values + + MDSTRUCTURE[|],MDSTRUCTURE_ID;METADATASET_ID;ACTION;TARGET_TYPES;TARGET_IDS;ATTRIBUTE_1: Attribut d'exemple 1;ATTRIBUTE_1.ATTRIBUTE_1_2[][en|fr]: Attribut d'exemple 12;ATTRIBUTE_2[]: Attribut d'exemple 2 + metadataflow;OECD:MDF(1.0.0): Metadataflow d'exemple;OECD:MDS(1.0.0): Metadataset d'exemple;I;dataflow;OECD:DF(1.0.0): Dataflow d'exemple;CODE_ID: Nom du code;"""en:""""

An XHTML text

""""|fr:""""

Un texte XHTML

""""""|""en:""""

Another XHTML text

""""|fr:""""

Un autre texte XHTML

""""""";123,45|6,789 + +Note that in this example the client prefers French (fr) language with the France (FR) locale, but will also accept any type of English. Therefore, in the message the French language with the France locale is applied, transforming also the field separator from comma (,) to semicolon (;), and the decimal separator from dot (.) to comma (,). + +#### 4) Localisation: HTTP Accept header: `application/vnd.sdmx.metadata+csv; version=1.0.0; labels=name`, HTTP Accept-Language header: `en-US`, metadata attribute with multiple instances, metadata attributes with multi-lingual values, different targets and metadatasets + + MDSTRUCTURE[;],MDSTRUCTURE_ID,MDSTRUCTURE_NAME,METADATASET_ID,METADATASET_NAME,ACTION,TARGET_TYPES,TARGET_IDS,TARGET_NAMES,ATTRIBUTE_1,Attribute 1,ATTRIBUTE_1.ATTRIBUTE_1_2[][en|fr],Attribute 12,ATTRIBUTE_2[],Attribute 2 + metadataflow,OECD:MDF(1.0.0),Metadataflow name,OECD:MDS(1.0.0),Metadataset name,I,dataflow;dataflow,OECD:DF(1.0.0);OECD:DF(1.1.0),Dataflow name 1;Dataflow name 2,CODE_ID,Code name,"""en:""""

An XHTML text

"""";fr:""""

Un texte XHTML

"""""";""en:""""

Another XHTML text

"""";fr:""""

Un autre texte XHTML

""""""",123.45;6.789 + metadataflow,OECD:MDF(1.0.0),Metadataflow name,OECD:MDS(1.1.0),Metadataset new name,I,codelist,OECD:CL(1.0.0),Codelist name,CODE_ID,Code name,"""en:""""

Text 1

"""";fr:""""

Texte 1

"""""";""en:""""

Text 2

"""";fr:""""

Texte 2

""""""",0 + +#### 5) Varying metadataflows + + MDSTRUCTURE[;],MDSTRUCTURE_ID,METADATASET_ID,ACTION,TARGET_TYPES,TARGET_IDS,ATTRIBUTE_1,ATTRIBUTE_2[][en;fr;de] + metadataflow,OECD:MDF(1.0.0),OECD:MDS(1.0.0),I,dataflow,OECD:DF(1.0.0),CODE_ID,"""en:Value1;fr:Valeur1"";""en:Value2;de:Wert2""" + metadataflow,OECD:MDF(1.1.0),OECD:MDS(1.1.0),I,dataflow,OECD:DF(1.1.0),CODE_ID,"""en:Value1;fr:Valeur1"";""en:Value2;de:Wert2""" + +#### 6) Non-versioned metadataset for a non-versioned(1) data provision agreement + + MDSTRUCTURE[;],MDSTRUCTURE_ID,METADATASET_ID,ACTION,TARGET_TYPES,TARGET_IDS,ATTRIBUTE_1,ATTRIBUTE_2[en;fr] + metadataprovision,OECD:MDP,OECD:MDS,I,dataflow,OECD:DF(1.0.0),CODE_ID,"en:Value1;fr:Valeur1" + +#### 7) Non-coded metadata attribute values with line-breaks + + MDSTRUCTURE[;],MDSTRUCTURE_ID,METADATASET_ID,ACTION,TARGET_TYPES,TARGET_IDS,ATTRIBUTE_1[] + metadataflow,OECD:MDF(1.0.0),OECD:MDS(1.0.0),I,dataflow,OECD:DF(1.0.0),"""This text with a line + break"";""This is some other text

""" + +#### 8) Deleting all values of specific metadata attributes from a non-versioned metadatset: the values that correspond to the attribute identifiers are deleted + + MDSTRUCTURE[;],MDSTRUCTURE_ID,METADATASET_ID,ACTION,TARGET_TYPES,TARGET_IDS,ATTRIBUTE_1[],ATTRIBUTE_1[].ATTRIBUTE_1_2[],ATTRIBUTE_2 + metadataflow,OECD:MDF(1.0.0),OECD:MDS,D,dataflow,OECD:DF(1.0.0),-,-,- + +#### 9) Deleting a whole metadataset: + + MDSTRUCTURE[;],MDSTRUCTURE_ID,METADATASET_ID,ACTION + metadataflow,OECD:MDF(1.0.0),OECD:MDS(1.0.0),D + metadataflow,OECD:MDF(1.1.0),OECD:MDS(1.1.0),D + +#### 10) Deleting all metadatasets defined by a specific metadata artefact: + + MDSTRUCTURE[;],MDSTRUCTURE_ID,METADATASET_ID,ACTION + metadataflow,OECD:MDF(1.0.0),,D + metadataflow,OECD:MDF(1.1.0),,D + +------------------------ + +**(1)** Note that since SDMX 3.0.0 the syntax *AGENCY:ARTEFACT_ID(VERSION)* allows omitting the version for non-versioned artefacts. In this case using *AGENCY:ARTEFACT_ID* is sufficient, e.g. `OECD:MDP`