Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

calculateLD.R: Fixed incorrect SNP map #178

Merged
merged 46 commits into from
Feb 21, 2024

Conversation

deepchocolate
Copy link
Contributor

@deepchocolate deepchocolate commented May 24, 2023

Fixes #177
Fixes #227

Changes proposed in this pull request:

  • Filter genotype map by SNP RSIDs in calculateLD.R
  • Moved testing of LD estimation and analysis to a script of their own: tests/test_LDpred2/scripts/ld.sh
  • Added test of ldpred2.R using LD data filtered by --extract (SNPs) and --sample-individuals
  • Added script to create blocks in LD matrixes: splitLD.R
  • Added script to analyze and plot LD: analyzeLD.R: Example plot:

image

Plotting functionality modified from: privefl/bigsnpr#4

Before submitting

  • I've read and followed all steps in the Making a pull request
    section of the CONTRIBUTING docs.
  • I've updated or added any relevant docstrings following the syntax described in the
    Writing docstrings section of the CONTRIBUTING docs.
  • If this PR fixes a bug, I've added a test that will fail without my fix.
  • If this PR adds a new feature, I've added tests that sufficiently cover my new functionality.

@deepchocolate deepchocolate changed the title Fixed bug in calculate LD and created test calculateLD.R: Fixed incorrect SNP map May 24, 2023
@espenhgn
Copy link
Contributor

The "Greetings" action failure is outside our control and can be ignored: actions/first-interaction#286

@espenhgn espenhgn marked this pull request as ready for review February 1, 2024 13:26
@espenhgn espenhgn added the enhancement New feature or request label Feb 2, 2024
Copy link
Contributor

@espenhgn espenhgn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @deepchocolate. Looks good to me, only bumped the version file + a couple minor fixes. The LD unittests failed on my end however, because of wrong file path(s). I've pushed a couple of fixes hardcoding the /ldpred2_ref mount point in calls pointing to these datas.

@deepchocolate
Copy link
Contributor Author

Thanks @deepchocolate. Looks good to me, only bumped the version file + a couple minor fixes. The LD unittests failed on my end however, because of wrong file path(s). I've pushed a couple of fixes hardcoding the /ldpred2_ref mount point in calls pointing to these datas.

Great, thanks for the review! I planned to do a test run on real data prior to merging, but haven't been able to login to TSD since Friday, so leaving it hanging here until that.

@espenhgn
Copy link
Contributor

espenhgn commented Feb 5, 2024

Thanks @deepchocolate. Looks good to me, only bumped the version file + a couple minor fixes. The LD unittests failed on my end however, because of wrong file path(s). I've pushed a couple of fixes hardcoding the /ldpred2_ref mount point in calls pointing to these datas.

Great, thanks for the review! I planned to do a test run on real data prior to merging, but haven't been able to login to TSD since Friday, so leaving it hanging here until that.

Perhaps good to run a test in MoBA or similar. But can confirm, TSD p697 rhel login machines are unreachable. But you can use Windows VM -> Putty -> p697-appnode-norment01 to get access to the resource.

#' @param quantile Range of estimates to keep
getBetasAuto <- function (fitAuto, quantile=0.95, verbose=T) {
range <- sapply(fitAuto, function (auto) diff(range(auto$corr_est)))
# Keep chains that pass the filtering below
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@espenhgn I noted that the problem om missing values start here (l199) and not at the quantile function below. There needs to be a na.rm=T to range as well. This is because if there are missing values in corr_est there will be missing in range and these will inevitably lead to missing values when assigning the keep variable below. Right now all chains with any missingness are thrown away, and I'm unsure if that's optimal

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure either.
Don't think it matters here, but perhaps best not to define a variable range replacing the base::range function within this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, that's a good point, will change the name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

na.rm=T in range causes other issues if an entire chain is missing, so will leave for now and address later.

# Extract individuals
if (!is.na(extractIndividuals)) {
indIds <- filterFromFile(obj.bigSNP$fam, extractIndividuals, colFilter='sample.ID')
cat('Extracting ', sum(indIds), ' individuals\n')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line throws an error as indIds is the same object type as obj.bigSNP$fam. sum should be changed to nrow and on line 103 indIds should have been indIds$fam$sample.ID. Will change name of indIds to something better.
This was not detected in the tests and as far as I can see this parameter does not have a test. Will add that.

@espenhgn
Copy link
Contributor

Hi @deepchocolate; @ofrei and I discussed restructuring some of the documentation and figured it's good to merge this soon. Is it ready enough?

@deepchocolate
Copy link
Contributor Author

Hi @deepchocolate; @ofrei and I discussed restructuring some of the documentation and figured it's good to merge this soon. Is it ready enough?

Yes, I can merge it today. There's still issues with calculating your own LD as it causes optimization errors during scoring. But I will leave that for now. I've tested most other things on MoBa data now and that has worked fine.

@deepchocolate deepchocolate merged commit b2f536f into main Feb 21, 2024
6 checks passed
@deepchocolate deepchocolate deleted the deepchocolate/calculate_ld_fix branch February 21, 2024 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
2 participants