Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Old harmonised data missing yaml files #1457

Closed
jiyue1214 opened this issue Oct 9, 2024 · 12 comments
Closed

Old harmonised data missing yaml files #1457

jiyue1214 opened this issue Oct 9, 2024 · 12 comments
Assignees

Comments

@jiyue1214
Copy link

There are some old studies in the format:
GCST123456

  • *******.gz
  • harmonised
    - GCST123456.f.tsv.gz
    - GCST123456.h.tsv.gz
    But missing the yaml file for harmonised data

Solutions: if the raw data did not start with the GCST, keep them as they are, generating a yaml file for the harmonised data.

@jiyue1214 jiyue1214 self-assigned this Oct 9, 2024
@jiyue1214
Copy link
Author

jiyue1214 commented Oct 9, 2024

/nfs/production/keane/amp/AMP_harmonization/harmonization/test/no_yaml/check_files.txt listed all the harmonised files whose names in the pattern of PMID_GCSTid_EFOid.h.tsv.gz.

Among them, 543 studies (raw_file_also_in_randome_name.tsv), whose raw data also do not follow the pattern: GCSTid.tsv.gz; formatted data and harmonised data in the same folder. It is ok to keep what they are right now, add a yaml file to these 543 studies.

@jiyue1214
Copy link
Author

@earlEBI reported these studies are missing yaml ftsv_empty-yamls-28.10.24.txt.

@jiyue1214
Copy link
Author

I compared the 543 studies in raw_file_also_in_randome_name.tsv and 510 studies in Lizzy's report, they are overlapped except GCST007800.

Next step: generate a yaml file using the GCST for these 544 studies on public FTP.

@jiyue1214
Copy link
Author

Karatug is working on a fix for generating all metadata yaml files. It should be fixed in a couple of days. Yue can check them when Karatug finish the job.

@jiyue1214
Copy link
Author

Waiting for the yaml file to be generated.

@karatugo karatugo self-assigned this Jan 13, 2025
@karatugo
Copy link
Member

karatugo commented Jan 13, 2025

@karatugo

  • Fix data file name logic accordingly
  • Push to prod
  • Mark the effected studies as pending

@karatugo
Copy link
Member

@karatugo
Copy link
Member

  • Check ftp staging in a couple of days. Effected studies can be found at /hps/nobackup/parkinso/spot/gwas/scratch/goci-1490/scripts/gcst_ids_f_meta_files_uniq.txt.

@jiyue1214
Copy link
Author

Yue checked that all yaml files are generated. 540/544 of yaml files have not been finished to be updated to the FTP folder yet.

@jiyue1214
Copy link
Author

Yue will move these yaml files to the FTP folder "manually", not wait for auto syn.

@jiyue1214
Copy link
Author

jiyue1214 commented Jan 28, 2025

264 yaml files cannot be moved due to the FTP is not editable by group members.

We can close this ticket since all the yaml files are ready in the staging folder. This ticket will be fully solved by the ticket EBISPOT/gwas-sumstats-tools#49

@jiyue1214
Copy link
Author

EBISPOT/gwas-sumstats-tools#49 is done and files are moved manually. This ticket can be closed.

@ljwh2 ljwh2 closed this as completed Feb 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants