Skip to content

Latest commit

 

History

History
64 lines (55 loc) · 4.69 KB

CLI_read.md

File metadata and controls

64 lines (55 loc) · 4.69 KB

Read


The Read tool is specifically designed to provide a comprehensive preview of GWAS summary statistics data files or metadata. It offers functionality to extract headers from GWAS summary statistics data files or extract specific fields from the metadata.

Usage

gwas-ssf read file [options]

Options

Options short name type Default value Description
--help -h Boolean False Display help message
--get_header -h Boolean False Return the first five rows of the file
--meta-in Path filename-meta.yaml Specify a metadata file to read in,defaulting to filename-meta.yaml
--get-all-metadata -M Boolean False Return all fields in the metadata file
--get-metadata -m List None Get metadata for the specified fields e.g. -m genome_assembly -m is_harmonised

Examples

Suppose you download a GWAS summary statistic file GCST90132222_buildGRCh37.tsv.gz and its corresponding metadata YAML file GCST90132222_buildGRCh37.tsv.gz-meta.yaml from the GWAS Catalog public FTP into the same folder, and you want to:

  1. Preview the summary statistic (first five rows of the input file.)
gwas-ssf read GCST90132222_buildGRCh37.tsv.gz --get_header
output
#-------- SUMSTATS DATA PREVIEW --------#
+-------------+------------+--------------------+---------------+--------------+---------+----------------+---------+------------------+---------------------------+
| variant_id  | chromosome | base_pair_location | effect_allele | other_allele | beta    | standard_error | p_value | variant_id_hg19  |base_pair_location_grch38 |
+=============+============+====================+===============+==============+=========+================+=========+==================+===========================+
| rs147324274 | 10         | 100000012          | A             | G            | 0.1719  | 0.2876         | 0.5501  | 10_100000012_G_A |98240255                  |
+-------------+------------+--------------------+---------------+--------------+---------+----------------+---------+------------------+---------------------------+
| NA          | 10         | 10000010           | T             | C            | -0.0329 | 0.0556         | 0.5536  | 10_10000010_C_T  |958047                   |
+-------------+------------+--------------------+---------------+--------------+---------+----------------+---------+------------------+---------------------------+
| rs144804129 | 10         | 100000122          | A             | T            | -0.0632 | 0.3363         | 0.8509  | 10_100000122_T_A |98240365                  |
+-------------+------------+--------------------+---------------+--------------+---------+----------------+---------+------------------+---------------------------+
| rs6602381   | 10         | 10000018           | G             | A            | -0.0088 | 0.0109         | 0.4206  | 10_10000018_A_G  |9958055                   |
+-------------+------------+--------------------+---------------+--------------+---------+----------------+---------+------------------+---------------------------+
| NA          | 10         | 10000030           | C             | A            | 0.0991  | 0.2386         | 0.6778  | 10_10000030_A_C  |9958067                   |
+-------------+------------+--------------------+---------------+--------------+---------+----------------+---------+------------------+---------------------------+
...
  1. Previewing all fields in the metadata YAML files, GCST90132222_buildGRCh37.tsv.gz-meta.yaml, which is located in the same directory where your code is being executed (This function searches for the filename+"-meta.yaml" automatically):
gwas-ssf read GCST90132222_buildGRCh37.tsv.gz --get-all-metadata
  1. Assuming you need to extract genome_assembly and harmonisation status fields from GCST90132222_buildGRCh37.tsv.gz-meta.yaml located in a different directory or named your_yaml_file.
gwas-ssf read GCST90132222_buildGRCh37.tsv.gz --meta-in path_to_yaml_file --get-metadata genome_assembly -m is_harmonised

output:

#-------- SUMSTATS METADATA --------#
genome_assembly: GRCh37
is_harmonised: false

Copyright © EMBL-EBI 2024 | EMBL-EBI is an Outstation of the European Molecular Biology Laboratory | Terms of use | Data Preservation Statement