Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for visualisation of LD #4

Open
privefl opened this issue Oct 16, 2017 · 3 comments
Open

Support for visualisation of LD #4

privefl opened this issue Oct 16, 2017 · 3 comments

Comments

@privefl
Copy link
Owner

privefl commented Oct 16, 2017

Example:

# computed with snp_cor() (do not hesitate to be stringent on thr_r2 or alpha)
corr <- readRDS(url("https://www.dropbox.com/s/65u96jf7y32j2mj/spMat.rds?raw=1"))

library(tidyverse)

## Transform sparse representation into (i,j,x) triplets
corrT <- as(corr, "dgTMatrix")
upper <- (corrT@i <= corrT@j)
df <- data.frame(
  i = corrT@i[upper], 
  j = corrT@j[upper],
  r2 = corrT@x[upper]
) %>%
  mutate(y = (j - i) / 2)

ggplot(df) +
  geom_point(aes(i + y, y, color = r2, alpha = r2), size = rel(0.5)) +
  coord_fixed() + 
  scale_color_gradientn(colours = rev(colorRamps::matlab.like2(100))) + 
  theme(axis.text.y = element_blank(),
        axis.ticks.y = element_blank()) +
  labs(x = "Position", y = NULL) +
  scale_alpha(guide = 'none')

@privefl
Copy link
Owner Author

privefl commented Mar 9, 2020

Another example,
visualizing 3 long-range LD regions of chromosome 6 using the 1000G data:

library(bigsnpr)
bedfile <- download_1000G("tmp-data")
fam2 <- bigreadr::fread2(sub_bed(bedfile, ".fam2"))
map <- bigreadr::fread2(sub_bed(bedfile, ".bim"))
rds <- snp_readBed2(bedfile, tempfile(), 
                    ind.row = which(fam2$`Super Population` == "EUR"),
                    ind.col = which(map$V1 == 6))
bigsnp <- snp_attach(rds)
G <- bigsnp$genotypes
POS <- bigsnp$map$physical.pos

corr <- snp_cor(G, infos.pos = POS, size = 1000, ncores = 4)

## Transform sparse representation into (i,j,x) triplets
corrT <- as(corr ** 2, "dgTMatrix")
upper <- which((corrT@i <= corrT@j) & (abs(corrT@x) > 0.1))
upper <- sample(upper, min(50e3, length(upper)))
df <- tibble::tibble(
  i = POS[corrT@i[upper] + 1L], 
  j = POS[corrT@j[upper] + 1L],
  r2 = corrT@x[upper],
  y = (j - i) / 2
)
dim(df)


library(ggplot2)
ggplot(df) +
  geom_point(aes(i + y, y, color = r2, alpha = r2), size = rel(0.5)) +
  # coord_fixed() +
  theme_minimal() +
  scale_x_continuous(expand = c(0.02, 0.02), minor_breaks = 0:17 * 1e7,
                     breaks = unlist(subset(LD.wiki34, Chr == 6, 2:3))) +
  scale_y_continuous(breaks = 0) +
  theme(axis.text.y = element_blank(), axis.ticks.y = element_blank()) +
  labs(x = "Position", y = NULL) +
  scale_alpha(guide = "none") + scale_color_continuous(guide = "none") +
  facet_wrap(~ cut(i + y, 10), scales = "free_x", ncol = 1) + 
  theme(strip.background = element_blank(), strip.text.x = element_blank())

LD-chr6

@garyzhubc
Copy link

Could you explain a bit how to use this code (the first part)? Do you simply input correlation between two columns of the bigsnpr genotype object to get corr and then plot?

@privefl
Copy link
Owner Author

privefl commented Dec 4, 2023

  • corr is typically gotten from snp_cor() (which gives you a sparse matrix of correlations between variants -> all pairs that are close enough and with a r2 large enough)
  • if you have many variants, make sure to filter df for e.g. r2 > 0.05 before plotting too many points (or directly in snp_cor())

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants