Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve pca #267

Merged
merged 15 commits into from
May 23, 2024
Merged

Improve pca #267

merged 15 commits into from
May 23, 2024

Conversation

smlmbrt
Copy link
Member

@smlmbrt smlmbrt commented Mar 25, 2024

Will use new python-based intersect variants to implement MAF/missingness filters to define variants in the target dataset that are PCA-eligible: https://github.com/PGScatalog/pygscatalog/blob/match_intersect/pgscatalog.match/src/pgscatalog/match/cli/intersect_cli.py

@smlmbrt smlmbrt linked an issue Mar 25, 2024 that may be closed by this pull request
3 tasks
@nebfield nebfield marked this pull request as ready for review May 23, 2024 09:02
@nebfield nebfield merged commit fe5dcb1 into dev May 23, 2024
32 of 58 checks passed
@nebfield nebfield deleted the improve_pca branch May 23, 2024 09:08
nebfield added a commit that referenced this pull request May 24, 2024
* Check for _ in sampleset names

* fix samplesheet path to point to VCF

* drop vcf suffix

* update tests with removed vcf suffix

* include inputs when relabelling (geno and sample files are unchanged)

* add more tests for results structure

* Expose documentation about switching versions.

* add cloud / JSON samplesheet docs

* add multiple chromosomes example

* add links to JSON samplesheet

* explicitly set default results to $PWD/results

this change affects people running the workflow directly from
github, e.g.

$ nextflow run pgscatalog/pgsc_calc ...

if --outdir isn't set, then the results folder can be in $NXF_HOME,
which is a hidden folder in the home directory by default. not a
helpful place for results to be!

this doesn't affect people running from a cloned repo directly

* Fix typo in output.rst

* Add in documentation about popsimilarity file.

* migrate to pygscatalog utilities (#296)

* add correlation test

* add correlation action

* fix download URL

* use scoring files from correlation archive

* get test profile working with pygscatalog

* integration updates

* fix correlation scorefile wildcard

* fix tests

* update plink2

* gzip afreq in plink2_vcf

* update custom scoring files for liftover

* fix match module test

* use local files in test suite

* fix singularity container definition

* check for environment variables with set -euxo

* logs are massive, don't upload, debug locally

* Improve pca (#267)

* Output allele frequencies along with missingness (for filtering variants)

* Add afreq to output

* Add afreq to intersect_variants.nf

* add afreq to intersect_thinned

* intersect with new pgscatalog-intersect application

* rebase

* Make verbose

* Remove duplication

* Use new output of intersect_variants in filtering

* Use new output of intersect_variants in intersect_variants.nf : keeps memory footprint very low (but higher I/O into tempfiles)

* Fix column index to PCA_ELIGIBLE (13)

* Fix awk statement that doesn't work with odd carriage return?

* Fix awk statement for True/False (not 0/1 as in previous version)

* Add in variant-based filters

---------

Co-authored-by: Benjamin Wingfield <[email protected]>

* remove duplicate container definition (pygscatalog)

* fix duplicate freq flags

* bump workflow version

* don't upload output directory in ancestry tests

* add docker uid runOption to test config

* just use working directory as tmpdir

* drop deprecated docker.userEmulation

* update upload-artifact to v4

* fix join failure caused by wrong meta in afreq output (VCF)

* Superseded by pgscatalog-intersect

* Update pgscatalog_utils conda environment

* use stable container tags

* bump pgscatalog.core version

---------

Co-authored-by: Benjamin Wingfield <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants