Update README.md

KlugerLab · Apr 4, 2024 · 0e1d759 · 0e1d759
1 parent 0ce26db
commit 0e1d759
Showing 1 changed file with 4 additions and 81 deletions.
diff --git a/README.md b/README.md
@@ -17,89 +17,12 @@ install.packages("devtools")
 devtools::install_github("KlugerLab/GeneTrajectory")
 ```
 ## Example tutorial
+Please check GeneTrajectory [tutorial](https://klugerlab.github.io/GeneTrajectory/articles/GeneTrajectory.html).
 
-The standard preprocessing can be done by employing the [Seurat](https://satijalab.org/seurat/articles/pbmc3k_tutorial.html) R package which includes: library normalization; finding variable features; scaling; generating PCA embedding (and UMAP embedding for visualization).
-
-```r
-#Load required packages
-require(GeneTrajectory)
-require(plot3D)
-require(ggplot2)
-require(viridis)
-require(scales)
-require(Seurat)
-require(SeuratWrappers)
-
-# Here, we load a preprocessed Seurat object for gene trajectory inference.
-data_S <- data_S_WLS_combined_E14.5
-DimPlot(data_S, reduction = "umap", label = T, label.size = 5, group.by = "cell_type") & NoAxes()
-DimPlot(data_S, group.by = "orig.ident") & NoAxes()
-```
-
-Next, we construct the cell-cell kNN graph and calculate cell-cell graph distances.
-```r
-data_S <- GeneTrajectory::RunDM(data_S)
-cell.graph.dist <- GetGraphDistance(data_S, K = 10, dims = 1:10)
-```
-
-We then narrow down the gene list for gene-gene distance computation by focusing on the top 2000 variable genes expressed by 1% - 50% of cells. 
-
-```r
-assay <- "RNA"
-DefaultAssay(data_S) <- assay
-data_S <- FindVariableFeatures(data_S, nfeatures = 2000)
-all_genes <- data_S@assays[[assay]]@var.features
-expr_percent <- apply(as.matrix(data_S[[assay]]@data[all_genes, ]) > 0, 1, sum)/ncol(data_S)
-genes <- all_genes[which(expr_percent > 0.01 & expr_percent < 0.5)]
-```
-
-The intermediate outputs are written into a local directory which allows for gene-gene Wasserstein distance computation using [Python OT package](https://pythonot.github.io/).
-```r
-cg_output <- CoarseGrain(data_S, cell.graph.dist, genes, N = 1000, dims = 1:10)
-dir.path <- "./mouse_dermal/" #This should be replaced by your local directory path.
-dir.create(dir.path, recursive=T)
-write.table(cg_output[["features"]], paste0(dir.path, "gene_names.csv"), row.names = F, col.names = F, sep = ",")
-write.table(cg_output[["graph.dist"]], paste0(dir.path, "ot_cost.csv"), row.names = F, col.names = F, sep = ",")
-Matrix::writeMM(Matrix(cg_output[["gene.expression"]], sparse = T), paste0(dir.path, "gene_expression.mtx"))
-
-#Please make sure to install the latest version of POT module (python), using the following:
-#pip install -U https://github.com/PythonOT/POT/archive/master.zip
-system(sprintf("nohup /data/anaconda3/bin/python ./GeneTrajectory/python/gene_distance_cal_parallel_Iter50000.py %s &", dir.path)) #Python paths should be adjusted accordingly.
-#Alternatively, this can also be run in the terminal by the following:
-#nohup /data/anaconda3/bin/python ./GeneTrajectory/python/gene_distance_cal_parallel_Iter50000.py ./mouse_dermal/ &
-```
-
-When the computation is finished, a file named `emd.csv` is generated in the same directory. Gene trajectory identification and visualization can be done using the following code.
-
-```r
-gene.dist.mat <- LoadGeneDistMat(dir.path, file_name = "emd.csv")
-gene_embedding <- GetGeneEmbedding(gene.dist.mat, K = 5)$diffu.emb
-gene_trajectory <- ExtractGeneTrajectory(gene_embedding, gene.dist.mat, N = 3, t.list = c(9,16,5), K = 5)
-
-#Visualizing gene trajectories using the leading three non-trivial eigenvectors.
-par(mar = c(1.5,1.5,1.5,1.5))
-scatter3D(gene_embedding[,1],
-          gene_embedding[,2],
-          gene_embedding[,3],
-          bty = "b2", colvar = as.integer(as.factor(gene_trajectory$selected))-1,
-          main = "trajectory", pch = 19, cex = 1, theta = 45, phi = 0,
-          col = ramp.col(c(hue_pal()(3))))
-```
-
-To examine how each given gene trajectory is reflected over the cell graph, we can track how these genes are expressed across different regions in the cell embedding. Here, we would recommend users to apply [ALRA](https://github.com/KlugerLab/ALRA/blob/master/README.md) imputation to smooth the expression values for generating gene bin plots.
-```r
-N.bin = 7
-data_S <- RunALRA(data_S)
-data_S <- AddGeneBinScore(data_S, gene_trajectory, N.bin = N.bin, trajectories = 1:3, assay = "alra")
-FeaturePlot(data_S, pt.size = 0.05, features = paste0("Trajectory",1,"_genes", 1:N.bin), ncol = N.bin, order = T) &
-  scale_color_gradientn(colors = rev(brewer_pal(palette = "RdYlBu")(10))) & NoLegend() & NoAxes()
-FeaturePlot(data_S, pt.size = 0.05, features = paste0("Trajectory",2,"_genes", 1:N.bin), ncol = N.bin, order = T) &
-  scale_color_gradientn(colors = rev(brewer_pal(palette = "RdYlBu")(10))) & NoLegend() & NoAxes()
-FeaturePlot(data_S, pt.size = 0.05, features = paste0("Trajectory",3,"_genes", 1:N.bin), ncol = N.bin, order = T) &
-  scale_color_gradientn(colors = rev(brewer_pal(palette = "RdYlBu")(10))) & NoLegend() & NoAxes()
-```
-
+## References
+References of GeneTrajectory functions can be found [here]([https://klugerlab.github.io/DAseq/reference/index.html](https://klugerlab.github.io/GeneTrajectory/reference/index.html)).
 
+Data used in the tutorial can be downloaded from [Figshare](https://figshare.com/articles/dataset/Processed_Seurat_objects_for_GeneTrajectory_inference_Gene_Trajectory_Inference_for_Single-cell_Data_by_Optimal_Transport_Metrics_/25243225).