Error in step #3 of projecting PC scores on new data #496

ozandikilitas · 2024-04-30T22:46:43Z

Hello,
I receive the following error upon trying to project a new dataset onto 1KG reference panel.

> obj.bed.new
A 'bed' object with 12096 samples and 128122270 variants.
> test_pca <- bed_projectPCA(
+   obj.bed,
+   obj.bed.new,
+   k = 20,
+   ind.row.new = rows_along(obj.bed.new),
+   ind.row.ref = ind.row,
+   ind.col.ref = ind.col,
+   strand_flip = TRUE,
+   join_by_pos = TRUE,
+   match.min.prop = 0.5,
+   verbose = TRUE,
+   ncores = nb_cores()
+ )

[Step 1/3] Matching variants of reference with target data..
1,233,111 variants to be matched.
6 ambiguous SNPs have been removed.
1,228,268 variants have been matched; 0 were flipped and 317,288 were reversed.

[Step 2/3] Computing (auto) SVD of reference..

Phase of clumping (on MAC) at r^2 > 0.2.. keep 302930 variants.
Discarding 0 variant with MAC < 10.

Iteration 1:
Computing SVD..
0 outlier variant detected..

Converged!

[Step 3/3] Projecting PC scores on new data..
Error in { : task 1 failed - "'...' used in an incorrect context"

Have you encountered this issue before (or the specific error I am receiving at the end)? Thank you for your help!

The text was updated successfully, but these errors were encountered:

ozandikilitas · 2024-05-01T14:36:13Z

Here's the line by line debugging, problem seems to stem from big_parallelize(). What would you like me to provide next?

 aric_test_pca <- bed_projectPCA(
+   obj.bed,
+   obj.bed.aric,
+   k = 20,
+   ind.row.new = rows_along(obj.bed.aric),
+ #  ind.row.ref = ind.row,
+ #  ind.col.ref = ind.col,
+   strand_flip = TRUE,
+   join_by_pos = TRUE,
+ #  match.min.prop = 0.5,
+   verbose = TRUE,
+   ncores = nb_cores()
+ ) 
debugging in: bed_projectPCA(obj.bed, obj.bed.aric, k = 20, ind.row.new = rows_along(obj.bed.aric), 
    strand_flip = TRUE, join_by_pos = TRUE, verbose = TRUE, ncores = nb_cores())
debug: {
    printf2 <- function(...) if (verbose) 
        printf(...)
    printf2("\n[Step 1/3] Matching variants of reference with target data..\n")
    map.ref <- setNames(obj.bed.ref$map[-3], c("chr", "rsid", 
        "pos", "a1", "a0"))
    map.new <- setNames(obj.bed.new$map[-3], c("chr", "rsid", 
        "pos", "a1", "a0"))
    if (join_by_pos && (build.new != build.ref)) {
        if (is.null(liftOver)) {
            stop2("Please provide path to liftOver executable.")
        }
        else if (substr(liftOver, 1, 1) != ".") {
            liftOver <- file.path(".", liftOver)
        }
        map.new <- snp_modifyBuild(map.new, liftOver = liftOver, 
            from = build.new, to = build.ref)
    }
    info_snp <- snp_match(cbind(map.ref, beta = 1), map.new, 
        strand_flip = strand_flip, join_by_pos = join_by_pos, 
        match.min.prop = match.min.prop)
    printf2("\n[Step 2/3] Computing (auto) SVD of reference..\n")
    obj.svd <- bed_autoSVD(obj.bed.ref, ind.row = ind.row.ref, 
        ind.col = intersect(ind.col.ref, info_snp$`_NUM_ID_.ss`), 
        k = k, ncores = ncores, verbose = verbose, ...)
    printf2("\n[Step 3/3] Projecting PC scores on new data..\n")
    keep <- match(attr(obj.svd, "subset"), info_snp$`_NUM_ID_.ss`)
    X_norm <- FBM(length(ind.row.new), 1, init = 0)
    XV <- FBM(length(ind.row.new), k, init = 0)
    big_parallelize(obj.bed.new$light, p.FUN = part_prod, ind = seq_along(keep), 
        ncores = ncores, ind.row = ind.row.new, ind.col = info_snp$`_NUM_ID_`[keep], 
        center = (obj.svd$center - 1) * info_snp$beta[keep] + 
            1, scale = obj.svd$scale * info_snp$beta[keep], V = obj.svd$v, 
        XV = XV, X_norm = X_norm)
    list(obj.svd.ref = obj.svd, simple_proj = XV[, , drop = FALSE], 
        OADP_proj = OADP_proj(XV, X_norm, obj.svd$d, ncores = ncores))
}
Browse[2]> n
debug: printf2 <- function(...) if (verbose) printf(...)
Browse[2]> n
debug: printf2("\n[Step 1/3] Matching variants of reference with target data..\n")
Browse[2]> n

[Step 1/3] Matching variants of reference with target data..
debug: map.ref <- setNames(obj.bed.ref$map[-3], c("chr", "rsid", "pos", 
    "a1", "a0"))
Browse[2]> n
debug: map.new <- setNames(obj.bed.new$map[-3], c("chr", "rsid", "pos", 
    "a1", "a0"))
Browse[2]> n
debug: if (join_by_pos && (build.new != build.ref)) {
    if (is.null(liftOver)) {
        stop2("Please provide path to liftOver executable.")
    }
    else if (substr(liftOver, 1, 1) != ".") {
        liftOver <- file.path(".", liftOver)
    }
    map.new <- snp_modifyBuild(map.new, liftOver = liftOver, 
        from = build.new, to = build.ref)
}
Browse[2]> n
debug: info_snp <- snp_match(cbind(map.ref, beta = 1), map.new, strand_flip = strand_flip, 
    join_by_pos = join_by_pos, match.min.prop = match.min.prop)
Browse[2]> n
1,233,111 variants to be matched.
6 ambiguous SNPs have been removed.
1,228,268 variants have been matched; 0 were flipped and 317,288 were reversed.
debug: printf2("\n[Step 2/3] Computing (auto) SVD of reference..\n")
Browse[2]> n

[Step 2/3] Computing (auto) SVD of reference..
debug: obj.svd <- bed_autoSVD(obj.bed.ref, ind.row = ind.row.ref, ind.col = intersect(ind.col.ref, 
    info_snp$`_NUM_ID_.ss`), k = k, ncores = ncores, verbose = verbose, 
    ...)
Browse[2]> n

Phase of clumping (on MAC) at r^2 > 0.2.. keep 304498 variants.
Discarding 0 variant with MAC < 10.

Iteration 1:
Computing SVD..
364 outlier variants detected..
5 long-range LD regions detected..

Iteration 2:
Computing SVD..
0 outlier variant detected..

Converged!
debug: printf2("\n[Step 3/3] Projecting PC scores on new data..\n")
Browse[2]> n

[Step 3/3] Projecting PC scores on new data..
debug: keep <- match(attr(obj.svd, "subset"), info_snp$`_NUM_ID_.ss`)
Browse[2]> n
debug: X_norm <- FBM(length(ind.row.new), 1, init = 0)
Browse[2]> n
debug: XV <- FBM(length(ind.row.new), k, init = 0)
Browse[2]> n
debug: big_parallelize(obj.bed.new$light, p.FUN = part_prod, ind = seq_along(keep), 
    ncores = ncores, ind.row = ind.row.new, ind.col = info_snp$`_NUM_ID_`[keep], 
    center = (obj.svd$center - 1) * info_snp$beta[keep] + 1, 
    scale = obj.svd$scale * info_snp$beta[keep], V = obj.svd$v, 
    XV = XV, X_norm = X_norm)
Browse[2]> n
Error in { : task 1 failed - "'...' used in an incorrect context"
> traceback()
6: stop(simpleError(msg, call = expr))
5: e$fun(obj, substitute(ex), parent.frame(), e$data)
4: foreach(ic = rows_along(intervals)) %dopar% {
       ind.part <- ind[seq(intervals[ic, "lower"], intervals[ic, 
           "upper"])]
       FUN(ind = ind.part, ...)
   }
3: bigparallelr::split_parapply(p.FUN, ind, X, ..., .combine = p.combine, 
       ncores = ncores, opts_cluster = list(type = getOption("bigstatsr.cluster.type")))
2: big_parallelize(obj.bed.new$light, p.FUN = part_prod, ind = seq_along(keep), 
       ncores = ncores, ind.row = ind.row.new, ind.col = info_snp$`_NUM_ID_`[keep], 
       center = (obj.svd$center - 1) * info_snp$beta[keep] + 1, 
       scale = obj.svd$scale * info_snp$beta[keep], V = obj.svd$v, 
       XV = XV, X_norm = X_norm)
1: bed_projectPCA(obj.bed, obj.bed.aric, k = 20, ind.row.new = rows_along(obj.bed.aric), 
       strand_flip = TRUE, join_by_pos = TRUE, verbose = TRUE, ncores = nb_cores())
> sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux 8.8 (Ootpa)

Matrix products: default
BLAS:   /usr/lib64/libblas.so.3.8.0
LAPACK: /usr/lib64/liblapack.so.3.8.0

locale:
 [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C         LC_TIME=C           
 [4] LC_COLLATE=C         LC_MONETARY=C        LC_MESSAGES=C       
 [7] LC_PAPER=C           LC_NAME=C            LC_ADDRESS=C        
[10] LC_TELEPHONE=C       LC_MEASUREMENT=C     LC_IDENTIFICATION=C 

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] bigsnpr_1.12.2    bigstatsr_1.5.12  patchwork_1.2.0   ggplot2_3.5.0    
 [5] flashpcaR_2.1     Hmisc_5.1-2       tidyr_1.3.1       tibble_3.2.1     
 [9] data.table_1.15.4 dplyr_1.1.4      

loaded via a namespace (and not attached):
 [1] runonce_0.2.3      foreach_1.5.2      carData_3.0-5      Formula_1.2-5     
 [5] doRNG_1.8.6        robustbase_0.99-2  pillar_1.9.0       backports_1.4.1   
 [9] lattice_0.20-45    glue_1.7.0         digest_0.6.35      ggsignif_0.6.4    
[13] checkmate_2.3.1    colorspace_2.1-1   cowplot_1.1.3      htmltools_0.5.8.1 
[17] Matrix_1.6-5       pkgconfig_2.0.3    broom_1.0.5        bigparallelr_0.3.2
[21] purrr_1.0.2        rmio_0.4.0         scales_1.3.0       RSpectra_0.16-1   
[25] ff_4.0.12          htmlTable_2.4.2    generics_0.1.3     car_3.1-2         
[29] ggpubr_0.6.0       withr_3.0.0        nnet_7.3-18        cli_3.6.2         
[33] magrittr_2.0.3     evaluate_0.23      ps_1.7.6           bigassertr_0.1.6  
[37] fansi_1.0.6        parallelly_1.37.1  doParallel_1.0.17  rstatix_0.7.2     
[41] foreign_0.8-86     tools_4.2.2        bigutilsr_0.3.4    lifecycle_1.0.4   
[45] stringr_1.5.1      munsell_0.5.1      bigsparser_0.6.1   cluster_2.1.4     
[49] rngtools_1.5.2     compiler_4.2.2     rlang_1.1.3        grid_4.2.2        
[53] iterators_1.0.14   rstudioapi_0.16.0  htmlwidgets_1.6.4  base64enc_0.1-3   
[57] rmarkdown_2.26     gtable_0.3.4       codetools_0.2-18   flock_0.7         
[61] abind_1.4-5        R6_2.5.1           gridExtra_2.3      knitr_1.46        
[65] bit_4.0.5          fastmap_1.1.1      utf8_1.2.4         bigreadr_0.2.5    
[69] stringi_1.8.3      parallel_4.2.2     Rcpp_1.0.12        vctrs_0.6.5       
[73] rpart_4.1.19       DEoptimR_1.1-3     tidyselect_1.2.1   xfun_0.43

privefl · 2024-05-02T06:42:23Z

Thanks for reporting.

Does this work for you?

library(bigsnpr)

bedfile <- download_1000G("tmp-data")
obj.bed.aric <- obj.bed <- bed(bedfile)

bed_projectPCA(obj.bed, obj.bed.aric, k = 20, ind.row.new = rows_along(obj.bed.aric), 
               strand_flip = TRUE, join_by_pos = TRUE, verbose = TRUE, ncores = nb_cores())

ozandikilitas · 2024-05-02T19:08:58Z

It does! With the following object as a result:

str(test)

List of 3
 $ obj.svd.ref:List of 7
  ..$ d     : num [1:20] 8238 5427 2966 2589 1288 ...
  ..$ u     : num [1:2490, 1:20] -0.013 -0.0129 -0.013 -0.0128 -0.0129 ...
  ..$ v     : num [1:378519, 1:20] 2.57e-03 -2.84e-05 2.64e-03 -1.46e-03 -1.19e-03 ...
  ..$ niter : num 7
  ..$ nops  : num 234
  ..$ center: num [1:378519] 0.0691 0.0538 0.0703 0.3827 0.1522 ...
  ..$ scale : num [1:378519] 0.258 0.229 0.26 0.556 0.375 ...
  ..- attr(*, "class")= chr "big_SVD"
  ..- attr(*, "subset")= int [1:378519] 1 3 8 9 10 12 14 16 18 19 ...
  ..- attr(*, "lrldr")='data.frame':	5 obs. of  3 variables:
  .. ..$ Chr  : num [1:5] 6 6 6 15 17
  .. ..$ Start: num [1:5] 30468791 33260215 33684313 25677831 1863804
  .. ..$ Stop : num [1:5] 32800427 33512731 36288421 25823561 2359952
 $ simple_proj: num [1:2490, 1:20] -107 -106 -107 -105 -106 ...
 $ OADP_proj  : num [1:2490, 1:20] -107 -107 -108 -106 -107 ...

Any suggestions on next steps?

privefl · 2024-05-03T06:10:54Z

What do you get for class(obj.bed) and class(obj.bed.aric) with your data?

ozandikilitas · 2024-05-03T13:58:06Z

Here it is
> class(obj.bed) [1] "bed" attr(,"package") [1] "bigsnpr"
> class(obj.bed.aric) [1] "bed" attr(,"package") [1] "bigsnpr"

ozandikilitas · 2024-05-03T14:43:23Z

FYI, when I also do the following with my obj.bed.aric (after following QC, removing related, restrict to HapMap3 etc.), it works with no issues. Is the issue something to do with subsetting datasets to the common SNPs between the reference and target?

bed_projectPCA(obj.bed.aric, obj.bed.aric, k = 20, ind.row.new = rows_along(obj.bed.aric), strand_flip = TRUE, join_by_pos = TRUE, verbose = TRUE, ncores = nb_cores())

privefl · 2024-05-06T09:04:53Z

You're using obj.bed.aric twice in the latest code you presented; is a test or just a typo?
Is it working when you do so?

I'm not sure I understood everything you tried to do (QC, etc). Could you try only one of those at once to see which one solves the problem? We could then have a better guess on finding what's going on (I currently have no idea).

ozandikilitas · 2024-05-06T13:32:15Z

No, it was just a test. I still receive the original error if I use 1000G as the reference and ARIC as the target dataset. QC stuff I mentioned was just going through what you have done on 1000G on your PCA vignette, nothing more than that. In any case, I still receive the error when the reference and target datasets are different.

privefl · 2024-05-06T17:04:26Z

But you don't when using only ARIC?
And you don't when using some QC?

ozandikilitas · 2024-05-06T18:24:31Z

Yes, if I use either 1000G OR ARIC for both reference and target dataset within bed_projectPCA() (i.e., bed_projectPCA(obj.bed.aric, obj.bed.aric, ...) OR bed_projectPCA(obj.bed.1KG, obj.bed.1KG, ...), the function executes without any errors. However, if I put 1000G as the reference and ARIC as the target dataset (i.e., bed_projectPCA(obj.bed.1KG, obj.bed.ARIC, ...)), I get the error I posted in my first message. Forget about the QC stuff I mentioned, that has not impact on the error message whatsoever, I just unnecessarily confused you with that.

privefl · 2024-05-06T18:57:23Z

Would you be able to share the bim file (infos on the variants) from the ARIC dataset with me?

ozandikilitas · 2024-05-07T16:13:37Z

After you requested the bim file, I figured out the problem. Reference dataset bim file had variant names in the format chr.pos whereas target dataset (i.e., ARIC) had variant names in the format chr:pos:ref:alt. Even though I had join_by_pos=TRUE in bed_projectPCA, I guess having different variant names created an issue. Once I made the variant names in the exact same format across both datasets, bed_projectPCA runs without any errors. Thank you for all your help! Happy to provide more information if you need it!

privefl · 2024-05-08T14:40:29Z

Variant names should not matter.
For example, this works:

library(bigsnpr)

bedfile <- download_1000G("tmp-data")
obj.bed <- bed(bedfile)

bigsnp <- snp_attach(snp_readBed2(bedfile, tempfile(), ind.col = 1:1000))
bigsnp$map$marker.ID <- 
  with(bigsnp$map, paste(chromosome, physical.pos, allele1, allele2, sep = ":"))
obj.bed.aric <- bed(snp_writeBed(bigsnp, tempfile(fileext = ".bed")))

bed_projectPCA(obj.bed, obj.bed.aric, k = 20, ind.row.new = rows_along(obj.bed.aric), 
               strand_flip = TRUE, join_by_pos = TRUE, verbose = TRUE, ncores = nb_cores())

ozandikilitas · 2024-05-08T15:28:04Z

Interesting, that is the only thing I changed. I will look into it a bit more to see whether I am missing something.

privefl · 2024-05-10T06:33:09Z

If we can understand what was going on, that would be great.

privefl · 2024-08-02T05:57:39Z

Any update on this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in step #3 of projecting PC scores on new data #496

Error in step #3 of projecting PC scores on new data #496

ozandikilitas commented Apr 30, 2024 •

edited

Loading

ozandikilitas commented May 1, 2024

privefl commented May 2, 2024

ozandikilitas commented May 2, 2024 •

edited by privefl

Loading

privefl commented May 3, 2024

ozandikilitas commented May 3, 2024

ozandikilitas commented May 3, 2024

privefl commented May 6, 2024

ozandikilitas commented May 6, 2024

privefl commented May 6, 2024 •

edited

Loading

ozandikilitas commented May 6, 2024

privefl commented May 6, 2024

ozandikilitas commented May 7, 2024

privefl commented May 8, 2024

ozandikilitas commented May 8, 2024

privefl commented May 10, 2024

privefl commented Aug 2, 2024

Error in step #3 of projecting PC scores on new data #496

Error in step #3 of projecting PC scores on new data #496

Comments

ozandikilitas commented Apr 30, 2024 • edited Loading

ozandikilitas commented May 1, 2024

privefl commented May 2, 2024

ozandikilitas commented May 2, 2024 • edited by privefl Loading

privefl commented May 3, 2024

ozandikilitas commented May 3, 2024

ozandikilitas commented May 3, 2024

privefl commented May 6, 2024

ozandikilitas commented May 6, 2024

privefl commented May 6, 2024 • edited Loading

ozandikilitas commented May 6, 2024

privefl commented May 6, 2024

ozandikilitas commented May 7, 2024

privefl commented May 8, 2024

ozandikilitas commented May 8, 2024

privefl commented May 10, 2024

privefl commented Aug 2, 2024

ozandikilitas commented Apr 30, 2024 •

edited

Loading

ozandikilitas commented May 2, 2024 •

edited by privefl

Loading

privefl commented May 6, 2024 •

edited

Loading