You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ Depending on the nature of your data (raw sequences, variant calling, arrays...)
6
6
*[**Array based metadata**](https://ega-archive.org/submission/array_based/metadata): must be submitted using EGA submitter portal and completing the [Array-based format (AF) spreasheet](https://github.com/EbiEga/ega-metadata-schema/blob/8dca24c694b0c005f1b0d665f1c6900e766f38d7/templates/array-based-metadata/EGA_Array_based_Format_V4.3.xlsx) ([_direct download_](https://github.com/EbiEga/ega-metadata-schema/raw/8dca24c694b0c005f1b0d665f1c6900e766f38d7/templates/array-based-metadata/EGA_Array_based_Format_V4.3.xlsx)).
7
7
*[**Sequence**](https://ega-archive.org/submission/sequence)**based metadata**: must be submitted either using the [EGA submitter portal](https://ega-archive.org/submission/tools/submitter-portal) or through the [programmatic submission](https://ega-archive.org/submission/sequence/programmatic_submissions) procedure. For the latter you will need to create correctly formatted XMLs containing your metadata:
8
8
* You will find examples of such XMLs (one file for each metadata object) within this repository: (1) [descriptive XMLs](examples/sequence-based-metadata/XML/XMLs_examples-descriptive) display what type of information corresponds to which part of the XML's structure; (2) [true example XMLs](examples/sequence-based-metadata/XML/XMLs_examples-true_values) contain fabricated information for you to see what a finished (and ready to be submitted) XML would look like.
9
-
* To ease this process, you could make use of the tool [Star2xml](Star2xml/). Follow its README to create these XMLs from the given [``joint template``](templates/sequence-based-metadata/sequence-based-metadata/EGA_metadata_submission_template_v1.xlsx).
9
+
* To ease this process, you could make use of the tool [Star2xml](Star2xml/). Follow its README to create these XMLs from the given [``joint template``](templates/sequence-based-metadata/EGA_metadata_submission_template_v1.xlsx).
Copy file name to clipboardExpand all lines: Star2xml/README.md
+12-9
Original file line number
Diff line number
Diff line change
@@ -2,9 +2,12 @@
2
2
## Index
3
3
1.[Overview](#Overview)
4
4
2.[Usage](#Usage)
5
-
2.1. [Pre-requisites](#Pre-requisites)
6
-
2.2. [Scripts](#Scripts): [``star2xml.py``](#star2xml.py) and [``validateXML.py``](#validateXML.py)
7
-
2.3. [Mock examples](#Mock-examples)
5
+
6
+
2.1. [Pre-requisites](#Pre-requisites)
7
+
8
+
2.2. [Scripts](#Scripts): [``star2xml.py``](#star2xml.py) and [``validateXML.py``](#validateXML.py)
9
+
10
+
2.3. [Mock examples](#Mock-examples)
8
11
3.[Filling out templates](#Filling-out-templates)
9
12
4.[Configuration files](#Configuration-files)
10
13
5.[Common issues](#Common-issues)
@@ -17,7 +20,7 @@ The Star2xml tool eases the process of XML creation prior metadata submission to
17
20
***Where?**
18
21
* Tool's scripts can be found in [Star2xml directory](./).
19
22
* Required Python packages can be found at [requirements.txt](requirements.txt).
20
-
* Use the file [``EGA_metadata_submission_template_v1.xlsx``](../templates/sequence-based-metadata/sequence-based-metadata/EGA_metadata_submission_template_v1.xlsx) as a template to fill in with your data, which can be used as the input for the Star2xml tool. Further information about its format and how to fill each of their tabs exists in [its section](#Filling-out-templates) on this README.
23
+
* Use the file [``EGA_metadata_submission_template_v1.xlsx``](../templates/sequence-based-metadata/EGA_metadata_submission_template_v1.xlsx) as a template to fill in with your data, which can be used as the input for the Star2xml tool. Further information about its format and how to fill each of their tabs exists in [its section](#Filling-out-templates) on this README.
21
24
* Configuration files (`input_configuration.yaml` and `xml_schema.yaml`) reside in the [configurations directory](configuration_files/). Information regarding their structure and how to modify them is located both within the files themselves and [their section](#Configuration-files) on this README.
22
25
23
26
@@ -90,7 +93,7 @@ Example of usage: $ ./star2xml.py "study,sample,analysis,experiment,run,dataset,
90
93
91
94
The **input file** will commonly be a **spreadsheet** with a tab named after each of the metadata objects (_e.g._ "run") we want to convert into XMLs. Instead of a joint spreadsheet, the tool also accepts **Comma and Tab Separated Values** (.csv and .tsv) files, each of which would contain data of one single metadata object (similar to one tab of the joint template).
92
95
93
-
For example, the joint template ([``EGA_metadata_submission_template_v1.xlsx``](../templates/sequence-based-metadata/sequence-based-metadata/EGA_metadata_submission_template_v1.xlsx)) contains a tab for each possible metadata object. Within each of them, one row corresponds to one metadata instance (_e.g._ one ``run`` per row), and each column to one field of information for such instance. In case we were interested in creating an XML containing the Run's metadata we would execute the following command:
96
+
For example, the joint template ([``EGA_metadata_submission_template_v1.xlsx``](../templates/sequence-based-metadata/EGA_metadata_submission_template_v1.xlsx)) contains a tab for each possible metadata object. Within each of them, one row corresponds to one metadata instance (_e.g._ one ``run`` per row), and each column to one field of information for such instance. In case we were interested in creating an XML containing the Run's metadata we would execute the following command:
It is worth mentioning that if there is an error while parsing the given XMLs (_e.g._ there are unclosed nodes - i.e. missing '`>`'), the validation will stop by default to notify the error. If this is not the desired behaviour, you may provide the optional argument `--dont_stop_parsing` to avoid terminating the execution, and instead report the file with errors as non-validated.
140
+
It is worth mentioning that if there is an error while parsing the given XMLs (_e.g._ there are unclosed nodes - _i.e._ missing '`>`'), the validation will stop by default to notify the error. If this is not the desired behaviour, you may provide the optional argument `--dont_stop_parsing` to avoid terminating the execution, and instead report the file with errors as non-validated.
138
141
139
142
### Mock examples
140
143
To get started with the tool, you can execute the following commands:
@@ -151,7 +154,7 @@ To get started with the tool, you can execute the following commands:
151
154
```
152
155
153
156
## Filling out templates
154
-
For this part of the documentation we will be using the joint template ([``EGA_metadata_submission_template_v1.xlsx``](../templates/sequence-based-metadata/sequence-based-metadata/EGA_metadata_submission_template_v1.xlsx)), a spreadsheet, since it is the most commonly used format. Nevertheless, stripping off the formatting, you may use a similar logic while filling plain text formats (``.csv`` and ``.tsv``)
157
+
For this part of the documentation we will be using the joint template ([``EGA_metadata_submission_template_v1.xlsx``](../templates/sequence-based-metadata/EGA_metadata_submission_template_v1.xlsx)), a spreadsheet, since it is the most commonly used format. Nevertheless, stripping off the formatting, you may use a similar logic while filling plain text formats (``.csv`` and ``.tsv``)
155
158
156
159
Based on the type of metadata objects you want to submit, you shall **fill their corresponding tabs** within such joint template. Each tab of the spreadsheet corresponds to one of the possible metadata objects (_e.g._``run``) from EGA, with the exception of the first tab, which is named ``Readme`` and contains information about the file's format. For all metadata tabs **each row will represent one repetition of a metadata object**. For example, each of the rows in the sample tab given as input will represent one ``<SAMPLE>`` node of the ``<SAMPLE_SET>`` in the final XML. All information that row contains will be associated with its corresponding ``<SAMPLE>`` node (its alias, description, etc.).
157
160
@@ -193,7 +196,7 @@ Additional information can be obtained from the colour of the column headers (fi
193
196
* Bright yellow: **required attributes**. All column headers that contain "``*``" are marked as required (_e.g._``Analysis_alias*``): their metadata shall be provided for each filled row.
194
197
* No colour: **optional** (yet highly recommended) **attributes**. These columns may be left empty, although we advise to also provide their corresponding metadata, for it will enrich your submission.
195
198
* Light yellow: **optionally required columns**. These are columns related to a choice from another column (based on multiple choice attributes). For instance, if our experiment's layout is ``PAIRED``, the two related columns (``PAIRED.Nominal_length`` and ``PAIRED.Nominal_sdev``) will change their header's format to light yellow, since these are required columns for a paired experiment.
196
-
* Grey: **optionally ignored columns**. Column headers that do not appear to be chosen for any metadata instance (row), and thus can be ignored (i.e. left empty) (based on multiple choice attributes). For instance, if our experiment's layout is ``SINGLE``, the two columns previously mentioned that are related to a paired experiment would be highlighted in grey.
199
+
* Grey: **optionally ignored columns**. Column headers that do not appear to be chosen for any metadata instance (row), and thus can be ignored (_i.e._ left empty) (based on multiple choice attributes). For instance, if our experiment's layout is ``SINGLE``, the two columns previously mentioned that are related to a paired experiment would be highlighted in grey.
197
200
* Other colours: **repetition blocks**. As we mentioned describing the [types of columns](#Types-of-columns), there are repeated columns. Their headers are alternatively coloured for each repeated class to ease their identification. Besides, the body of the column is coloured in a lighter colour than their headers alternating between *repeated blocks* of the same class.
@@ -202,7 +205,7 @@ Additional information can be obtained from the colour of the column headers (fi
202
205
203
206
This section of the README displays additional information about how the tool works using their configuration files. Such knowledge will most likely not be relevant to the average user, and thus **you may skip it**. Nevertheless, if you wish to change the configuration files, it will come in handy.
204
207
205
-
There are **two configuration files**: `input_configuration.yaml` and ``xml_schema.yaml``. The former simply **lists the required fields for each input file** (i.e. if a column named ``Sample_alias*`` needs to be present or not). The latter **describes the structure of the corresponding XML** (i.e. which nodes are children of which) and **associates each column name of the input file with its corresponding node's characteristic** (either an attribute or text). Both are `YAML` files, which are easy-to-read information holders, and can be interpreted as dictionaries/lists of elements. Besides the information displayed here, additional instructions on how to modify them reside within the files themselves.
208
+
There are **two configuration files**: `input_configuration.yaml` and ``xml_schema.yaml``. The former simply **lists the required fields for each input file** (_i.e._ if a column named ``Sample_alias*`` needs to be present or not). The latter **describes the structure of the corresponding XML** (_i.e._ which nodes are children of which) and **associates each column name of the input file with its corresponding node's characteristic** (either an attribute or text). Both are `YAML` files, which are easy-to-read information holders, and can be interpreted as dictionaries/lists of elements. Besides the information displayed here, additional instructions on how to modify them reside within the files themselves.
206
209
207
210
### Basic structure - ``xml_schema.yaml``
208
211
At base level, the file contains **information of the tool itself** (`tool_info` - used to add details to reports), the **metadata schemas** (`XML_schemas_info` - used to both download `.xsd` files and create XMLs) and **one element for each metadata object** (_e.g._`sample`) describing its XML's architecture.
0 commit comments