diff --git a/assets/images/streamlining-GWAS-Summary-Statistics-Data-Interpretation-with-Improved-YAML-Metadata/meta_yaml.png b/assets/images/streamlining-GWAS-Summary-Statistics-Data-Interpretation-with-Improved-YAML-Metadata/meta_yaml.png new file mode 100644 index 0000000..e283cf8 Binary files /dev/null and b/assets/images/streamlining-GWAS-Summary-Statistics-Data-Interpretation-with-Improved-YAML-Metadata/meta_yaml.png differ diff --git a/content/blog/streamlining-GWAS-Summary-Statistics-Data-Interpretation-with-Improved-YAML-Metadata.md b/content/blog/streamlining-GWAS-Summary-Statistics-Data-Interpretation-with-Improved-YAML-Metadata.md index e0e18f2..976d9b3 100644 --- a/content/blog/streamlining-GWAS-Summary-Statistics-Data-Interpretation-with-Improved-YAML-Metadata.md +++ b/content/blog/streamlining-GWAS-Summary-Statistics-Data-Interpretation-with-Improved-YAML-Metadata.md @@ -4,6 +4,7 @@ author: Yue Ji and Laura Harris date: Feburary 22, 2024 description: Having clear and accessible metadata is essential for enhancing data interpretation and ensuring its reusability. In the case of Genome-Wide Association Studies (GWAS), having a standardized and easy-to-understand format for documenting study metadata is crucial. In the GWAS Catalog, metadata associated with full genome-wide summary statistics files is accessible via multiple routes - searchable in the main Catalog via the website and REST API, slug: streamlining-GWAS-Summary-Statistics-Data-Interpretation-with-Improved-YAML-Metadata +img: meta_yaml.png --- Having clear and accessible metadata is essential for enhancing data interpretation and ensuring its reusability. In the case of Genome-Wide Association Studies (GWAS), having a standardized and easy-to-understand format for documenting study metadata is crucial. In the GWAS Catalog, metadata associated with full genome-wide summary statistics files is accessible via multiple routes - searchable in the main Catalog via the website and REST API, and additionally via a text file in YAML format, contained in the same directory as the data file. @@ -28,42 +29,8 @@ These updates reflect our commitment to improving the user experience while ensu Table 1. Metadata field definitions -| Field | Description | Data type and values | Mandatory | Example | -| -------------------------------- | ---------------------------------------------------------------------------------- | ------------------------------------------------------- | ----------------------------------- | -------------------------------------------------------------------------------------------------------------------- | -| \# Study meta-data | -| gwas_id | GWAS Catalog accession ID | Text string | Yes | GCST90244057 | -| author_notes | Additional information about this study from the author | Text string | No | File contains GWAS summary statistics from a meta-analysis of NMR metabolic traits in up to 33 cohorts. | -| gwas_catalog_api | GWAS catalog REST API link | Text string | Yes | [https://www.ebi.ac.uk/gwas/rest/api/studies/GCST90244057](https://www.ebi.ac.uk/gwas/rest/api/studies/GCST90244057) | -| date_metadata_last_modified | The latest date that metadata YAML file was modified | date | Yes | 2023-11-28 | -| \# Trait Information | -| trait_description | Author reported trait description | Text string (multiple possible) | Yes | Body mass index | -| ontology_mapping | Short form ontology terms describing the trait | Text string (multiple possible) | No | EFO_0004918 | -| \# Genotyping Information | -| genome_assembly | Genome assembly for the summary statistics. | GRCh/NCBI/UCSC value | Yes | GRCh37 | -| coordinate_system | Coordinate system used for the summary statistics | Text String (1-based or 0-based) | No | 1-based | -| genotyping_technology | Method(s) used to genotype variants in the discovery stage. | Text string (multiple possible) | Yes | Genome-wide genotyping array | -| imputation_panel | Panel used for imputation | Text string | No | HRC + UK10K | -| imputation_software | Software used for imputation | Text string | No | SHAPEIT3 + IMPUTE4 | -| \# Sample Information | -| sample_ancestry_category | Broad ancestry category that best describes the sample. | Text string | Yes | European | -| sample_ancestry | The most detailed ancestry descriptor(s) for the sample. | Text string (multiple possible) | Yes | \- Finnish
- British | -| sample_size | Sample size | Integer | Yes | 27006 | -| ancestry_method | Method used to determine sample ancestry e.g. self-reported/genetically determined | Text string (multiple possible) | No | self-reported | -| case_control_study | Flag whether the study is a case-control study | Boolean | No (default is false) | true | -| case_count | Number of cases for case/control study | Integer | No, unless caseControlStudy is true | 27006 | -| control_count | Number of controls for case/control study | Integer | No, unless caseControlStudy is true | 27006 | -| sex | To indicate a sex-stratified analysis | M (for male), F (for female), combined or NR if unknown | No | combined | -| \# Summary Statistic information | -| data_file_name | The name of the summary statistics file | Text string | Yes | GCST90244057_buildGRCh37.tsv | -| file_type | The format of the summary statistics file | "GWAS-SSF v1.0", "pre-GWAS-SSF", "non-GWAS-SSF" | Yes | GWAS-SSF v1.0 | -| data_file_md5sum | The md5 checksum of the summary statistics file. | Text string | Yes | 0ec56396f89edcc21a3d5a25a6fa993d | -| analysis_software | Software and version used for the association analysis | Text string (multiple possible) | Yes if p-values of 0 given | REGENIE | -| adjusted_covariates | Any covariates the GWAS is adjusted for | Text string (multiple possible) | No | sex | -| minor_allele_freq_lower_limit | Lowest possible effect allele frequency | Numeric | No | 0.0003 | -| \# Harmonization status | -| is_harmonised | Description of harmonisation codes | Text string | Only given in harmonised datasets | false | -| is_sorted | Flag whether the file is sorted by genomic location | Boolean | Yes | false | -| harmonisation_reference | The genome reference file used for harmonising the summary statistics file | Text string | No | ftp://ftp.ensembl.org/pub/release-104/fasta/homo_sapiens/dna/ | + + ## Questions and feedback Questions or comments about this change? Please contact us as gwas-info@ebi.ac.uk.