09 Aug 06:47

nebfield

96fbb23

v2.0.0-beta.3 Latest

Latest

Changelog

Important fix: Fix splitting duplicated variant IDs across multiple scoring files

Background

The MATCH_COMBINE step writes new scoring files for input to plink2 --score
When plink2 encounters a variant with the same ID across multiple rows in a scoring file it will ignore duplicates and warn about them
This only happens when the same variant ID has different effect alleles across different rows
- A variant ID with the same effect allele and scores across multiple columns is OK, this causes scores to be calculated in parallel

Example

When using PGS000039, PGS000040, and PGS000041 in parallel some variants have different effect alleles at the same coordinates, for example:

22:40682469:T:C with effect allele T (PGS000041_hmPOS_GRCh38)
22:40682469:T:C with effect allele C (PGS000039_hmPOS_GRCh38)

Impact

In versions v2.0.0-beta, beta.1, and beta.2 the duplicated variant is written to the same scoring file and ignored by plink2. The duplicated variant doesn't contribute to the final calculated PGS.

In all v2.0.0-alpha versions and beta.3 a second scoring file is correctly written containing the other allele (additional alleles create extra scoring files automatically within the updated MATCH_COMBINE process). We have also updated the software tests to ensure this error doesn't occur in future releases.

This problem is more likely to happen when larger scores are calculated in parallel. As more scores are calculated in parallel, it's more likely that variant IDs with different effect alleles will duplicate and be ignored during the score calculation stage.

While the overall impact on the final score is likely to be small we encourage users to upgrade to beta.3, especially if they calculate larger scores in parallel.

How do I know if my data are affected?

$ cd work/71/35fa3c977993b71d5a85fb6721e8c3 # cd to a scoring process directory 
$ comm -3 <(sort hgdp_22_additive_0.sscore.vars) <(zcat hgdp_22_additive_0.scorefile.gz | tail -n +2 | cut -f 1 | sort)
	22:40682469:T:C

One missing variant appears in the output. This check is now included in the scoring module.

Other fixes

Fix --keep_ambiguous parameter #346 (@nebfield)
Fix variant matching information getting dropped from log when scores didn't pass the match rate threshold (@nebfield)
Fix fraposa-pgsc handling exclusively numeric IIDs PGScatalog/fraposa_pgsc#18 (@smlmbrt)

Contributors

smlmbrt and nebfield

Assets 2

1 Join discussion

31 Jul 11:59

nebfield

v2.0.0-beta.2

69c467e

v2.0.0-beta.2

Changelog

Features

Add FID support internally (FID + IID must be unique for all samples) [@nebfield, thanks to @jasamack for initial draft fix]
Add parameters to tune target variant missingness (--pca_geno_miss_target, default maximum 10%) and/or MAF (--pca_maf_target, default no filtering) during intersection with the reference panel. [@smlmbrt]
- The new defaults will help incorrect ancestry assignments when running the calculator on low sample sizes (revert to pre-beta version behaviour), as this behaviour was caused by the MAF filter before.
Add --efo_id parameter, deprecating --trait_efo which will be removed in a future release

Misc

Remove default anaconda channels because of license changes #342

Contributors

smlmbrt, nebfield, and jasamack

Assets 2

10 Jul 15:12

nebfield

v2.0.0-beta.1

0f33b4c

v2.0.0-beta.1

Changelog

Bug fixes

Fix samplesheet parsing error warnings by @smlmbrt in #322
Write consistent column sets to variant information files by @nebfield in #330

Full Changelog: v2.0.0-beta...v2.0.0-beta.1

Contributors

smlmbrt and nebfield

Assets 2

19 Jun 18:40

nebfield

v2.0.0-beta

ca334fa

v2.0.0-beta

Changelog

Graduating to beta with the release of our preprint 🎉

Improvements

Improve aggregation PGScatalog/pygscatalog#23
Improve matching performance PGScatalog/pygscatalog#22
Improve match error docs #311
Publish dependencies to Bioconda to improve conda profile UX

Bug fixes

Fix for PGScatalog/pygscatalog#21
Closes #301
Specify modules explicitly to fix #312
Fix bim input to pgscatalog-aggregate #319

Assets 2

0 Join discussion

24 May 11:25

nebfield

v2.0.0-alpha.6

0198033

pgsc_calc v2.0.0-alpha.6 Pre-release

Pre-release

Changelog

2024-05-28 update: We're investigating unexpected pgscatalog.core.lib.pgsexceptions.MatchRateError in some environments (e.g. UK Biobank on a HPC). This release has been downgraded to a pre-release

Please note the minimum required nextflow version has been updated to v23.10.0, released in October 2023. Run nextflow self-update to upgrade your nextflow version.

Improvements

Migrate our custom python tools to new pygscatalog packages
- Reference / target intersection now considers allelic frequency and variant missingness to determine PCA eligibility
- Downloads from PGS Catalog should be faster (async)
- Packages are now documented
Update plink version to alpha 5.10 final #179
Add docs describing cloud execution
Add correlation test comparing calculated scores against known good scores
When matching variants, matching logs are now written before scorefiles to improve debugging UX
Improvements to PCA quality (ensuring low missingness and suitable MAF for PCA-eligble variants in target samples).
- This could allow us to implement MAF/missingness filters for scoring file variants in the future.

Bug fixes

Fix ancestry adjustment with VCFs #252
Fix support for scoring files that only have one effect type column #280
Fix adjusting PGS with zero variance (skip them) #283
Check for reserved characters in sampleset names

Known bug

Incorrectly adjusting the AVG in --run_ancestry mode #301
unexpected pgscatalog.core.lib.pgsexceptions.MatchRateError in some environments (e.g. UK Biobank on a HPC)

Assets 2

2 Join discussion

19 Mar 16:51

nebfield

v2.0.0-alpha.5

8bdf287

pgsc_calc v2.0.0-alpha.5

Changelog

Improvements

Automatically mount directories inside singularity containers without setting any configuration
Improve permanent caching of ancestry processes with --genotypes_cache parameter
resync with nf-core framework
Refactor combine_scorefiles to improve speed and quality control processes

Bug fixes

Fix semantic storeDir definitions causing problems cloud execution (google batch)
Fix missing DENOM values with multiple custom scoring files (score calculation not affected)
Fix liftover failing silently with custom scoring files (thanks Brooke!)

Misc:

Move aggregation step out of report
Improve speed of ANCESTRY_ANALYSIS

Assets 2

0 Join discussion

05 Dec 13:59

nebfield

v2.0.0-alpha.4

83326a1

pgsc_calc v2.0.0-alpha.4

Changelog

Improvements

Give a more helpful error message when there's no valid matches in match_combine

Bug fixes

Fix retrying downloads when the EBI servers are sleepy on a Monday morning
Fix numeric sample identifiers breaking ancestry analysis
Check chr prefix in samplesheets

Assets 2

05 Oct 11:16

nebfield

v2.0.0-alpha.3

ddb19b3

pgsc_calc v2.0.0-alpha.3

Improvements:

Automatically retry scoring with more RAM on larger datasets
Describe scoring precision in docs
Change handling of VCFs to reduce errors when recoding
Internal changes to improve support for custom reference panels

Bug fixes:

Fix VCF input to ancestry projection subworkflow (thanks frahimov and AWS-crafter for patiently debugging)
Fix scoring options when reading allelic frequencies from a reference panel (thanks raimondsre for reporting the changes from v1.3.2 -> 2.0.0-alpha)
Fix conda profile action

Assets 2

1 Join discussion

12 Sep 14:44

nebfield

v2.0.0-alpha.2

ba8e03c

pgsc_calc v2.0.0-alpha.2

Changelog

Bump pgscatalog_utils v0.4.0 -> v0.4.1
- Closes #165

Assets 2

11 Aug 14:03

nebfield

v2.0.0-alpha.1

28a0971

pgsc_calc v2.0.0-alpha.1

This patch fixes a bug when running the workflow directly from github with the test profile (i.e. without cloning first). Thanks to @staedlern for reporting the problem.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changelog

Important fix: Fix splitting duplicated variant IDs across multiple scoring files

Background

Example

Impact

How do I know if my data are affected?

Other fixes

Contributors

Changelog

Features

Misc

Contributors

Changelog

Bug fixes

Contributors

Changelog

Improvements

Bug fixes

Changelog

Improvements

Bug fixes

Known bug

Changelog

Improvements

Bug fixes

Misc:

Changelog

Improvements

Bug fixes

Improvements:

Bug fixes:

Changelog

Releases: PGScatalog/pgsc_calc

v2.0.0-beta.3

Changelog

Important fix: Fix splitting duplicated variant IDs across multiple scoring files

Background

Example

Impact

How do I know if my data are affected?

Other fixes

Contributors

v2.0.0-beta.2

Changelog

Features

Misc

Contributors

v2.0.0-beta.1

Changelog

Bug fixes

Contributors

v2.0.0-beta

Changelog

Improvements

Bug fixes

pgsc_calc v2.0.0-alpha.6

Changelog

Improvements

Bug fixes

Known bug

pgsc_calc v2.0.0-alpha.5

Changelog

Improvements

Bug fixes

Misc:

pgsc_calc v2.0.0-alpha.4

Changelog

Improvements

Bug fixes

pgsc_calc v2.0.0-alpha.3

Improvements:

Bug fixes:

pgsc_calc v2.0.0-alpha.2

Changelog

pgsc_calc v2.0.0-alpha.1