Skip to content

Latest commit



64 lines (55 loc) · 4.69 KB

File metadata and controls

64 lines (55 loc) · 4.69 KB


The Read tool is specifically designed to provide a comprehensive preview of GWAS summary statistics data files or metadata. It offers functionality to extract headers from GWAS summary statistics data files or extract specific fields from the metadata.


gwas-ssf read file [options]


Options short name type Default value Description
--help -h Boolean False Display help message
--get_header -h Boolean False Return the first five rows of the file
--meta-in Path filename-meta.yaml Specify a metadata file to read in,defaulting to filename-meta.yaml
--get-all-metadata -M Boolean False Return all fields in the metadata file
--get-metadata -m List None Get metadata for the specified fields e.g. -m genome_assembly -m is_harmonised


Suppose you download a GWAS summary statistic file GCST90132222_buildGRCh37.tsv.gz and its corresponding metadata YAML file GCST90132222_buildGRCh37.tsv.gz-meta.yaml from the GWAS Catalog public FTP into the same folder, and you want to:

  1. Preview the summary statistic (first five rows of the input file.)
gwas-ssf read GCST90132222_buildGRCh37.tsv.gz --get_header
#-------- SUMSTATS DATA PREVIEW --------#
| variant_id  | chromosome | base_pair_location | effect_allele | other_allele | beta    | standard_error | p_value | variant_id_hg19  |base_pair_location_grch38 |
| rs147324274 | 10         | 100000012          | A             | G            | 0.1719  | 0.2876         | 0.5501  | 10_100000012_G_A |98240255                  |
| NA          | 10         | 10000010           | T             | C            | -0.0329 | 0.0556         | 0.5536  | 10_10000010_C_T  |958047                   |
| rs144804129 | 10         | 100000122          | A             | T            | -0.0632 | 0.3363         | 0.8509  | 10_100000122_T_A |98240365                  |
| rs6602381   | 10         | 10000018           | G             | A            | -0.0088 | 0.0109         | 0.4206  | 10_10000018_A_G  |9958055                   |
| NA          | 10         | 10000030           | C             | A            | 0.0991  | 0.2386         | 0.6778  | 10_10000030_A_C  |9958067                   |
  1. Previewing all fields in the metadata YAML files, GCST90132222_buildGRCh37.tsv.gz-meta.yaml, which is located in the same directory where your code is being executed (This function searches for the filename+"-meta.yaml" automatically):
gwas-ssf read GCST90132222_buildGRCh37.tsv.gz --get-all-metadata
  1. Assuming you need to extract genome_assembly and harmonisation status fields from GCST90132222_buildGRCh37.tsv.gz-meta.yaml located in a different directory or named your_yaml_file.
gwas-ssf read GCST90132222_buildGRCh37.tsv.gz --meta-in path_to_yaml_file --get-metadata genome_assembly -m is_harmonised


#-------- SUMSTATS METADATA --------#
genome_assembly: GRCh37
is_harmonised: false

Copyright © EMBL-EBI 2024 | EMBL-EBI is an Outstation of the European Molecular Biology Laboratory | Terms of use | Data Preservation Statement