Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
RihaoQu authored Apr 4, 2024
1 parent 0ce26db commit 0e1d759
Showing 1 changed file with 4 additions and 81 deletions.
85 changes: 4 additions & 81 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,89 +17,12 @@ install.packages("devtools")
devtools::install_github("KlugerLab/GeneTrajectory")
```
## Example tutorial
Please check GeneTrajectory [tutorial](https://klugerlab.github.io/GeneTrajectory/articles/GeneTrajectory.html).

The standard preprocessing can be done by employing the [Seurat](https://satijalab.org/seurat/articles/pbmc3k_tutorial.html) R package which includes: library normalization; finding variable features; scaling; generating PCA embedding (and UMAP embedding for visualization).

```r
#Load required packages
require(GeneTrajectory)
require(plot3D)
require(ggplot2)
require(viridis)
require(scales)
require(Seurat)
require(SeuratWrappers)

# Here, we load a preprocessed Seurat object for gene trajectory inference.
data_S <- data_S_WLS_combined_E14.5
DimPlot(data_S, reduction = "umap", label = T, label.size = 5, group.by = "cell_type") & NoAxes()
DimPlot(data_S, group.by = "orig.ident") & NoAxes()
```

Next, we construct the cell-cell kNN graph and calculate cell-cell graph distances.
```r
data_S <- GeneTrajectory::RunDM(data_S)
cell.graph.dist <- GetGraphDistance(data_S, K = 10, dims = 1:10)
```

We then narrow down the gene list for gene-gene distance computation by focusing on the top 2000 variable genes expressed by 1% - 50% of cells.

```r
assay <- "RNA"
DefaultAssay(data_S) <- assay
data_S <- FindVariableFeatures(data_S, nfeatures = 2000)
all_genes <- data_S@assays[[assay]]@var.features
expr_percent <- apply(as.matrix(data_S[[assay]]@data[all_genes, ]) > 0, 1, sum)/ncol(data_S)
genes <- all_genes[which(expr_percent > 0.01 & expr_percent < 0.5)]
```

The intermediate outputs are written into a local directory which allows for gene-gene Wasserstein distance computation using [Python OT package](https://pythonot.github.io/).
```r
cg_output <- CoarseGrain(data_S, cell.graph.dist, genes, N = 1000, dims = 1:10)
dir.path <- "./mouse_dermal/" #This should be replaced by your local directory path.
dir.create(dir.path, recursive=T)
write.table(cg_output[["features"]], paste0(dir.path, "gene_names.csv"), row.names = F, col.names = F, sep = ",")
write.table(cg_output[["graph.dist"]], paste0(dir.path, "ot_cost.csv"), row.names = F, col.names = F, sep = ",")
Matrix::writeMM(Matrix(cg_output[["gene.expression"]], sparse = T), paste0(dir.path, "gene_expression.mtx"))

#Please make sure to install the latest version of POT module (python), using the following:
#pip install -U https://github.com/PythonOT/POT/archive/master.zip
system(sprintf("nohup /data/anaconda3/bin/python ./GeneTrajectory/python/gene_distance_cal_parallel_Iter50000.py %s &", dir.path)) #Python paths should be adjusted accordingly.
#Alternatively, this can also be run in the terminal by the following:
#nohup /data/anaconda3/bin/python ./GeneTrajectory/python/gene_distance_cal_parallel_Iter50000.py ./mouse_dermal/ &
```

When the computation is finished, a file named `emd.csv` is generated in the same directory. Gene trajectory identification and visualization can be done using the following code.

```r
gene.dist.mat <- LoadGeneDistMat(dir.path, file_name = "emd.csv")
gene_embedding <- GetGeneEmbedding(gene.dist.mat, K = 5)$diffu.emb
gene_trajectory <- ExtractGeneTrajectory(gene_embedding, gene.dist.mat, N = 3, t.list = c(9,16,5), K = 5)

#Visualizing gene trajectories using the leading three non-trivial eigenvectors.
par(mar = c(1.5,1.5,1.5,1.5))
scatter3D(gene_embedding[,1],
gene_embedding[,2],
gene_embedding[,3],
bty = "b2", colvar = as.integer(as.factor(gene_trajectory$selected))-1,
main = "trajectory", pch = 19, cex = 1, theta = 45, phi = 0,
col = ramp.col(c(hue_pal()(3))))
```

To examine how each given gene trajectory is reflected over the cell graph, we can track how these genes are expressed across different regions in the cell embedding. Here, we would recommend users to apply [ALRA](https://github.com/KlugerLab/ALRA/blob/master/README.md) imputation to smooth the expression values for generating gene bin plots.
```r
N.bin = 7
data_S <- RunALRA(data_S)
data_S <- AddGeneBinScore(data_S, gene_trajectory, N.bin = N.bin, trajectories = 1:3, assay = "alra")
FeaturePlot(data_S, pt.size = 0.05, features = paste0("Trajectory",1,"_genes", 1:N.bin), ncol = N.bin, order = T) &
scale_color_gradientn(colors = rev(brewer_pal(palette = "RdYlBu")(10))) & NoLegend() & NoAxes()
FeaturePlot(data_S, pt.size = 0.05, features = paste0("Trajectory",2,"_genes", 1:N.bin), ncol = N.bin, order = T) &
scale_color_gradientn(colors = rev(brewer_pal(palette = "RdYlBu")(10))) & NoLegend() & NoAxes()
FeaturePlot(data_S, pt.size = 0.05, features = paste0("Trajectory",3,"_genes", 1:N.bin), ncol = N.bin, order = T) &
scale_color_gradientn(colors = rev(brewer_pal(palette = "RdYlBu")(10))) & NoLegend() & NoAxes()
```

## References
References of GeneTrajectory functions can be found [here]([https://klugerlab.github.io/DAseq/reference/index.html](https://klugerlab.github.io/GeneTrajectory/reference/index.html)).

Data used in the tutorial can be downloaded from [Figshare](https://figshare.com/articles/dataset/Processed_Seurat_objects_for_GeneTrajectory_inference_Gene_Trajectory_Inference_for_Single-cell_Data_by_Optimal_Transport_Metrics_/25243225).



0 comments on commit 0e1d759

Please sign in to comment.