update documents

qingjian1991 · qingjian1991 · commit 37440e80a24c · 2022-08-26T16:01:16.000+08:00
diff --git a/documents/MPTevol.Rmd b/documents/MPTevol.Rmd
@@ -612,7 +612,9 @@ The samples trees reflecting the overall genetic similarity are often suffered f
 
 ### 6.1 Inferring clonal structures
 
-This step is to infer the clonal structures. The `sciClone` [4](#refer) and `PyClone` [5](#refer) could infer the clonal structures. 
+This step is to infer the clonal structures. Many tools have been published to infer the clonal structures, including `sciClone` [4](#refer) and `PyClone` [5](#refer). 
+
+### 6.1.1 suggestions for inferring clonal structures
 
 Two prominent approaches in clonal evolution studies are:
 
@@ -626,7 +628,14 @@ should be performed using copy-number aware tools such as `PyClone`, and copy nu
 corrected VAFs can be obtained by dividing the CCFs estimated by such tools by two.
 
 
-In MPTevol, the format of `variants` is used. 
+### 6.1.2 prepare the variants structures.
+
+In MPTevol, the format of `variants` is used for downstream analysis. 
+
+The `variants` is a data frame, the rows indicate variants, and the columns include variant cellular prevalence of each sample and a column of cluster information. The cellular prevalence of variants is used to measure how many tumor cells containing such mutations. The VAF or CCF can be used for cellular prevalence. **The cluster should be named contiguous integer numbers**, starting from 1. The cellular prevalence columns should be short for better visualization.
+
+Users are suggested to generate this data frame by yourselves because the variant clustering results need manual evaluation. The `maf2variants` can transform the maf format into `variants` if the cluster inform is included in the CCF data (**TO DO**).
+
 
 ```{r}
 # load data
@@ -636,8 +645,13 @@ data("variants.ref", package = "MPTevol")
 head(variants,3)
 ```
 
+For this data frame, columns from `Chromosome` to `mutid` indicates basic information for each variant, columns from `BRCA_1` to `UterusM_7` indicates the cellular prevalence for each sample, `sciClone` indicates the mutation clusters inferred from sciClone, `kmeans` indicates the mutation clusters inferred from k-means method and `cluster` is the final cluster information used for downstream analysis.  
+
+
 ### 6.2 Check clonal prevalence across samples.
 
+Since each cluster represents a clone, missing or incorrectly infer a cluster could hinder us from successful construction of the evolution models. Therefore, it is extremely important to obtain a good clustering result. `MPTevol` provides a convenient visualization of variant clusters across multiple samples to help evaluate clustering results, particularly when no tree is inferred.
+
 
 ```{r warning=FALSE}
 library(clonevol)