-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #29 from EBISPOT/yaml_update
Yaml update
- Loading branch information
Showing
2 changed files
with
3 additions
and
36 deletions.
There are no files selected for viewing
Binary file added
BIN
+993 KB
...ummary-Statistics-Data-Interpretation-with-Improved-YAML-Metadata/meta_yaml.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,6 +4,7 @@ author: Yue Ji and Laura Harris | |
date: Feburary 22, 2024 | ||
description: Having clear and accessible metadata is essential for enhancing data interpretation and ensuring its reusability. In the case of Genome-Wide Association Studies (GWAS), having a standardized and easy-to-understand format for documenting study metadata is crucial. In the GWAS Catalog, metadata associated with full genome-wide summary statistics files is accessible via multiple routes - searchable in the main Catalog via the website and REST API, | ||
slug: streamlining-GWAS-Summary-Statistics-Data-Interpretation-with-Improved-YAML-Metadata | ||
img: meta_yaml.png | ||
--- | ||
|
||
Having clear and accessible metadata is essential for enhancing data interpretation and ensuring its reusability. In the case of Genome-Wide Association Studies (GWAS), having a standardized and easy-to-understand format for documenting study metadata is crucial. In the GWAS Catalog, metadata associated with full genome-wide summary statistics files is accessible via multiple routes - searchable in the main Catalog via the website and REST API, and additionally via a text file in YAML format, contained in the same directory as the data file. | ||
|
@@ -28,42 +29,8 @@ These updates reflect our commitment to improving the user experience while ensu | |
|
||
|
||
Table 1. Metadata field definitions | ||
| Field | Description | Data type and values | Mandatory | Example | | ||
| -------------------------------- | ---------------------------------------------------------------------------------- | ------------------------------------------------------- | ----------------------------------- | -------------------------------------------------------------------------------------------------------------------- | | ||
| \# Study meta-data | | ||
| gwas_id | GWAS Catalog accession ID | Text string | Yes | GCST90244057 | | ||
| author_notes | Additional information about this study from the author | Text string | No | File contains GWAS summary statistics from a meta-analysis of NMR metabolic traits in up to 33 cohorts. | | ||
| gwas_catalog_api | GWAS catalog REST API link | Text string | Yes | [https://www.ebi.ac.uk/gwas/rest/api/studies/GCST90244057](https://www.ebi.ac.uk/gwas/rest/api/studies/GCST90244057) | | ||
| date_metadata_last_modified | The latest date that metadata YAML file was modified | date | Yes | 2023-11-28 | | ||
| \# Trait Information | | ||
| trait_description | Author reported trait description | Text string (multiple possible) | Yes | Body mass index | | ||
| ontology_mapping | Short form ontology terms describing the trait | Text string (multiple possible) | No | EFO_0004918 | | ||
| \# Genotyping Information | | ||
| genome_assembly | Genome assembly for the summary statistics. | GRCh/NCBI/UCSC value | Yes | GRCh37 | | ||
| coordinate_system | Coordinate system used for the summary statistics | Text String (1-based or 0-based) | No | 1-based | | ||
| genotyping_technology | Method(s) used to genotype variants in the discovery stage. | Text string (multiple possible) | Yes | Genome-wide genotyping array | | ||
| imputation_panel | Panel used for imputation | Text string | No | HRC + UK10K | | ||
| imputation_software | Software used for imputation | Text string | No | SHAPEIT3 + IMPUTE4 | | ||
| \# Sample Information | | ||
| sample_ancestry_category | Broad ancestry category that best describes the sample. | Text string | Yes | European | | ||
| sample_ancestry | The most detailed ancestry descriptor(s) for the sample. | Text string (multiple possible) | Yes | \- Finnish<br>- British | | ||
| sample_size | Sample size | Integer | Yes | 27006 | | ||
| ancestry_method | Method used to determine sample ancestry e.g. self-reported/genetically determined | Text string (multiple possible) | No | self-reported | | ||
| case_control_study | Flag whether the study is a case-control study | Boolean | No (default is false) | true | | ||
| case_count | Number of cases for case/control study | Integer | No, unless caseControlStudy is true | 27006 | | ||
| control_count | Number of controls for case/control study | Integer | No, unless caseControlStudy is true | 27006 | | ||
| sex | To indicate a sex-stratified analysis | M (for male), F (for female), combined or NR if unknown | No | combined | | ||
| \# Summary Statistic information | | ||
| data_file_name | The name of the summary statistics file | Text string | Yes | GCST90244057_buildGRCh37.tsv | | ||
| file_type | The format of the summary statistics file | "GWAS-SSF v1.0", "pre-GWAS-SSF", "non-GWAS-SSF" | Yes | GWAS-SSF v1.0 | | ||
| data_file_md5sum | The md5 checksum of the summary statistics file. | Text string | Yes | 0ec56396f89edcc21a3d5a25a6fa993d | | ||
| analysis_software | Software and version used for the association analysis | Text string (multiple possible) | Yes if p-values of 0 given | REGENIE | | ||
| adjusted_covariates | Any covariates the GWAS is adjusted for | Text string (multiple possible) | No | sex | | ||
| minor_allele_freq_lower_limit | Lowest possible effect allele frequency | Numeric | No | 0.0003 | | ||
| \# Harmonization status | | ||
| is_harmonised | Description of harmonisation codes | Text string | Only given in harmonised datasets | false | | ||
| is_sorted | Flag whether the file is sorted by genomic location | Boolean | Yes | false | | ||
| harmonisation_reference | The genome reference file used for harmonising the summary statistics file | Text string | No | ftp://ftp.ensembl.org/pub/release-104/fasta/homo_sapiens/dna/ | | ||
<article-image src="streamlining-GWAS-Summary-Statistics-Data-Interpretation-with-Improved-YAML-Metadata/meta_yaml.png" alt="Metadata field definitions" style='height: 100%; width: 100%'></article-image> | ||
|
||
## Questions and feedback | ||
|
||
Questions or comments about this change? Please contact us as [email protected]. |