System information
Description of the Issue
The format command with --generate-metadata crashes for a filename that doesn't contain GCST, even if the metadata is otherwise valid due to attempting to concatenate a string and None.
Creating a symlink to the file with a GCST name processes the metadata (more or less) as expected, except that it adds the GWAS Catalog IDs.
Calling the format command with a GCST filename that doesn't exist, still processes and writes the metadata file.
Ideally, gwas_id and gwas_catalog_api shouldn't be forced to be inferred for files they are not required of.
Error Message
---------- METADATA ----------
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/jovyan/env/lib/python3.12/site-packages/gwas_sumstats_tools/cli.py:188 in ss_format │
│ │
│ 185 │ │ if custom_header_map else {} │
│ 186 │ meta_dict = metadata_dict_from_args(args=extra_args.args) \ │
│ 187 │ │ if metadata_edit_mode else {} │
│ ❱ 188 │ format(filename=filename, │
│ 189 │ │ data_outfile=data_outfile, │
│ 190 │ │ minimal_to_standard=minimal_to_standard, │
│ 191 │ │ generate_metadata=generate_metadata, │
│ │
│ ╭──────────────────────────────── locals ────────────────────────────────╮ │
│ │ custom_header_map = False │ │
│ │ data_outfile = None │ │
│ │ extra_args = <click.core.Context object at 0x7f1629176b70> │ │
│ │ filename = PosixPath('output.tsv.gz') │ │
│ │ generate_metadata = True │ │
│ │ header_map = {} │ │
│ │ meta_dict = {} │ │
│ │ metadata_edit_mode = False │ │
│ │ metadata_from_gwas_cat = False │ │
│ │ metadata_infile = PosixPath('minimal.yaml') │ │
│ │ metadata_outfile = PosixPath('generated.yaml') │ │
│ │ minimal_to_standard = False │ │
│ ╰────────────────────────────────────────────────────────────────────────╯ │
│ │
│ /home/jovyan/env/lib/python3.12/site-packages/gwas_sumstats_tools/format.py:144 in format │
│ │
│ 141 │ # Get metadata │
│ 142 │ if generate_metadata: │
│ 143 │ │ print("[bold]\n---------- METADATA ----------\n[/bold]") │
│ ❱ 144 │ │ metadata = formatter.set_metadata( │
│ 145 │ │ │ from_gwas_cat=metadata_from_gwas_cat, custom_metadata=metadata_dict │
│ 146 │ │ ) │
│ 147 │ │ print(metadata) │
│ │
│ ╭───────────────────────────────────────── locals ─────────────────────────────────────────╮ │
│ │ data_outfile = None │ │
│ │ filename = PosixPath('output.tsv.gz') │ │
│ │ formatter = <gwas_sumstats_tools.format.Formatter object at 0x7f16289d3e60> │ │
│ │ generate_metadata = True │ │
│ │ header_map = {} │ │
│ │ metadata_dict = {} │ │
│ │ metadata_from_gwas_cat = False │ │
│ │ metadata_infile = PosixPath('minimal.yaml') │ │
│ │ metadata_outfile = PosixPath('generated.yaml') │ │
│ │ minimal_to_standard = False │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────╯ │
│ │
│ /home/jovyan/env/lib/python3.12/site-packages/gwas_sumstats_tools/format.py:88 in set_metadata │
│ │
│ 85 │ │ │ metadata object │
│ 86 │ │ """ │
│ 87 │ │ self.meta.from_file() │
│ ❱ 88 │ │ meta_dict = get_file_metadata( │
│ 89 │ │ │ in_file=self.data_infile, │
│ 90 │ │ │ out_file=self.data_outfile, │
│ 91 │ │ │ meta_dict=self.meta.as_dict(), │
│ │
│ ╭───────────────────────────────────── locals ──────────────────────────────────────╮ │
│ │ custom_metadata = {} │ │
│ │ from_gwas_cat = False │ │
│ │ self = <gwas_sumstats_tools.format.Formatter object at 0x7f16289d3e60> │ │
│ ╰───────────────────────────────────────────────────────────────────────────────────╯ │
│ │
│ /home/jovyan/env/lib/python3.12/site-packages/gwas_sumstats_tools/interfaces/metadata.py:186 in │
│ get_file_metadata │
│ │
│ 183 │ inferred_meta_dict['genome_assembly'] = GENOME_ASSEMBLY_MAPPINGS.get(parse_genome_as │
│ 184 │ inferred_meta_dict['data_file_md5sum'] = get_md5sum(out_file) if Path(out_file).exis │
│ 185 │ inferred_meta_dict['date_last_modified'] = date.today() │
│ ❱ 186 │ inferred_meta_dict['gwas_catalog_api'] = GWAS_CAT_API_STUDIES_URL + parse_accession_ │
│ 187 │ for field, value in inferred_meta_dict.items(): │
│ 188 │ │ update_dict_if_not_set(meta_dict, field, value) │
│ 189 │ return meta_dict │
│ │
│ ╭────────────────────────────────────── locals ───────────────────────────────────────╮ │
│ │ in_file = PosixPath('output.tsv.gz') │ │
│ │ inferred_meta_dict = { │ │
│ │ │ 'gwas_id': None, │ │
│ │ │ 'data_file_name': 'output.tsv.gz', │ │
│ │ │ 'file_type': 'GWAS-SFF v1.0', │ │
│ │ │ 'genome_assembly': 'unknown', │ │
│ │ │ 'data_file_md5sum': '7e29306421cfb296a5e1099f2e461390', │ │
│ │ │ 'date_last_modified': datetime.date(2024, 11, 6) │ │
│ │ } │ │
│ │ meta_dict = { │ │
│ │ │ 'genotyping_technology': [ │ │
│ │ │ │ 'Genome-wide genotyping array' │ │
│ │ │ ], │ │
│ │ │ 'gwas_id': None, │ │
│ │ │ 'trait_description': None, │ │
│ │ │ 'minor_allele_freq_lower_limit': None, │ │
│ │ │ 'data_file_name': 'output.tsv.gz', │ │
│ │ │ 'file_type': 'GWAS-SSF v1.0', │ │
│ │ │ 'data_file_md5sum': None, │ │
│ │ │ 'is_harmonised': False, │ │
│ │ │ 'is_sorted': False, │ │
│ │ │ 'date_last_modified': datetime.date(2024, 11, 6), │ │
│ │ │ ... +12 │ │
│ │ } │ │
│ │ out_file = PosixPath('output.tsv.gz') │ │
│ ╰─────────────────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: can only concatenate str (not "NoneType") to str```
### Command used and terminal output
```console
$ gwas-ssf format empty.tsv.gz --meta-in minimal.yaml --meta-out generated.yaml --generate-metadata
...
# Crashes with the message above
TypeError: can only concatenate str (not "NoneType") to str
# However simply calling the validator with a symlink to the same file works
$ ln -s emtpy.tsv GCST1.tsv
$ gwas-ssf format GCST1.tsv.gz --meta-in minimal.yaml --meta-out generated.yaml --generate-metadata
---------- METADATA ----------
adjusted_covariates:
- age
- sex
analysis_software: PLINK 1.9
author_notes: Example
coordinate_system: 1-based
data_file_md5sum: 05eea3e7b985d4f552fcec50c102bed8
data_file_name: output.tsv.gz
date_last_modified: 2024-11-06
file_type: GWAS-SSF v1.0
genome_assembly: GRCh37
genotyping_technology:
- Genome-wide genotyping array
gwas_catalog_api: https://www.ebi.ac.uk/gwas/rest/api/studies/GCST1
gwas_id: GCST1
harmonisation_reference: null
imputation_panel: 1000 Genomes Phase 3 (placeholder)
imputation_software: GENOTYPE
is_harmonised: false
is_sorted: false
minor_allele_freq_lower_limit: null
ontology_mapping: null
samples:
- ancestry_method:
- self-reported
- gentically determined
case_control_study: false
case_count: null
control_count: null
sample_ancestry: null
sample_size: 1000
sex: combined
trait_description: null
Writing metadata --> generated.yaml
# Surprising, even if that file doesn't actually exist
$ gwas-ssf format GCST999999999999999.tsv --meta-in minimal.yaml --meta-out generated.yaml --generate-metadata
[Errno 2] No such file or directory: 'GCST999999999999999.tsv'
---------- METADATA ----------
adjusted_covariates:
- age
- sex
analysis_software: PLINK 1.9
author_notes: Example
coordinate_system: 1-based
data_file_md5sum: null
data_file_name: output.tsv.gz
date_last_modified: 2024-11-06
file_type: GWAS-SSF v1.0
genome_assembly: GRCh37
genotyping_technology:
- Genome-wide genotyping array
gwas_catalog_api: https://www.ebi.ac.uk/gwas/rest/api/studies/GCST999999999999999
gwas_id: GCST999999999999999
harmonisation_reference: null
imputation_panel: 1000 Genomes Phase 3 (placeholder)
imputation_software: GENOTYPE
is_harmonised: false
is_sorted: false
minor_allele_freq_lower_limit: null
ontology_mapping: null
samples:
- ancestry_method:
- self-reported
- gentically determined
case_control_study: false
case_count: null
control_count: null
sample_ancestry: null
sample_size: 1000
sex: combined
trait_description: null
Writing metadata --> generated.yaml
$
First 10 Rows of the Input File
empty.tsv:
chromosome base_pair_location effect_allele other_allele beta standard_error p_value variant_id ref_allele
minimal.yaml:
adjusted_covariates:
- age
- sex
analysis_software: PLINK 1.9
author_notes: Example
coordinate_system: 1-based
data_file_name: output.tsv.gz
date_last_modified: 2024-11-06
file_type: GWAS-SSF v1.0
genome_assembly: GRCh37
genotyping_technology:
- Genome-wide genotyping array
imputation_panel: 1000 Genomes Phase 3 (placeholder)
imputation_software: GENOTYPE
is_harmonised: false
is_sorted: false
samples:
- ancestry_method:
- self-reported
- gentically determined
case_control_study: false
sample_size: 1000
sex: combined```
### Relevant files
_No response_
System information
Description of the Issue
The
formatcommand with--generate-metadatacrashes for a filename that doesn't containGCST, even if the metadata is otherwise valid due to attempting to concatenate a string andNone.Creating a symlink to the file with a GCST name processes the metadata (more or less) as expected, except that it adds the GWAS Catalog IDs.
Calling the
formatcommand with aGCSTfilename that doesn't exist, still processes and writes the metadata file.Ideally,
gwas_idandgwas_catalog_apishouldn't be forced to be inferred for files they are not required of.Error Message
First 10 Rows of the Input File
empty.tsv:minimal.yaml: