Skip to content

Commit

Permalink
add param min_cor to snp_ancestry_summary()
Browse files Browse the repository at this point in the history
  • Loading branch information
privefl committed May 10, 2024
1 parent e6e08ea commit aaec3d7
Show file tree
Hide file tree
Showing 6 changed files with 52 additions and 10 deletions.
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@ Encoding: UTF-8
Package: bigsnpr
Type: Package
Title: Analysis of Massive SNP Arrays
Version: 1.12.7
Date: 2024-04-17
Version: 1.12.8
Date: 2024-05-10
Authors@R: c(
person("Florian", "Privé", email = "[email protected]", role = c("aut", "cre")),
person("Michael", "Blum", role = "ths"),
Expand Down
4 changes: 4 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
## bigsnpr 1.12.8

- In function `snp_ancestry_summary()`, now also report correlations between input frequencies and each reference frequencies as well as predicted frequencies. Also add a new parameter `min_cor` to error when the latter correlation is too small.

## bigsnpr 1.12.7

- In function `snp_modifyBuild()`, fix a ftp broken link, and add the possibility to use a local chain file specified by the new parameter `local_chain`.
Expand Down
20 changes: 17 additions & 3 deletions R/ancestry-summary.R
Original file line number Diff line number Diff line change
Expand Up @@ -11,15 +11,23 @@
#' @param projection Matrix of "loadings" for each variant/PC to be used to
#' project allele frequencies.
#' @param correction Coefficients to correct for shrinkage when projecting.
#' @param min_cor Minimum correlation between observed and predicted frequencies.
#' Default is 0.4. When correlation is lower, an error is returned.
#' For individual genotypes, this should be larger than 0.6.
#' For allele frequencies, this should be larger than 0.9.
#'
#' @return vector of coefficients representing the ancestry proportions.
#' @return Vector of coefficients representing the ancestry proportions.
#' Also (as attributes) `cor_each`, the correlation between input
#' frequencies and each reference frequencies, and `cor_pred`, the correlation
#' between input and predicted frequencies.
#' @export
#'
#' @importFrom stats cor
#'
#' @example examples/example-ancestry-summary.R
#'
snp_ancestry_summary <- function(freq, info_freq_ref, projection, correction) {
snp_ancestry_summary <- function(freq, info_freq_ref, projection, correction,
min_cor = 0.4) {

assert_package("quadprog")
assert_nona(freq)
Expand Down Expand Up @@ -51,10 +59,16 @@ snp_ancestry_summary <- function(freq, info_freq_ref, projection, correction) {
)

cor_pred <- drop(cor(drop(X0 %*% res$solution), freq))
if (cor_pred < min_cor)
stop2("Correlation between frequencies is too low: %.3f; %s",
cor_pred, "check matching between variants.")
if (cor_pred < 0.99)
warning2("The solution does not perfectly match the frequencies.")

setNames(round(res$solution, 7), colnames(info_freq_ref))
structure(round(res$solution, 7),
names = colnames(info_freq_ref),
cor_each = drop(cor(X0, freq)),
cor_pred = cor_pred)
}

################################################################################
6 changes: 5 additions & 1 deletion docs/news/index.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

18 changes: 16 additions & 2 deletions man/snp_ancestry_summary.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 8 additions & 2 deletions tests/testthat/test-5-match.R
Original file line number Diff line number Diff line change
Expand Up @@ -146,13 +146,19 @@ test_that("snp_ancestry_summary() works (with no projection here)", {
prop <- c(0.2, 0.1, 0.6, 0.1)
y <- X %*% prop
res <- snp_ancestry_summary(y, X, Matrix::Diagonal(nrow(X), 1), rep(1, nrow(X)))
expect_equal(res, prop)
expect_equal(res, prop, check.attributes = FALSE)
expect_equal(attr(res, "cor_pred"), 1)

expect_warning(res2 <- snp_ancestry_summary(
y, X[, -1], Matrix::Diagonal(nrow(X), 1), rep(1, nrow(X))),
"The solution does not perfectly match the frequencies.")
expect_true(all(res2 > prop[-1]))
expect_equal(res2, prop[-1], tolerance = 0.1)
expect_equal(res2, prop[-1], tolerance = 0.1, check.attributes = FALSE)
expect_equal(attr(res2, "cor_pred"), drop(cor(y, X[, -1] %*% res2)))

expect_error(res3 <- snp_ancestry_summary(
y, X[, -3], Matrix::Diagonal(nrow(X), 1), rep(1, nrow(X)), min_cor = 0.6),
"Correlation between frequencies is too low:")
})

################################################################################

0 comments on commit aaec3d7

Please sign in to comment.