bm2-lab · ChiZhou-SITI · Sep 22, 2018
diff --git a/README.md b/README.md
@@ -2,12 +2,12 @@
 
 * **MASCOT** is the first one-stop applicable pipeline based on topic model to analyze single-cell CRISPR screening data (independently termed **Perturb-Seq**, **CRISP-seq**, or **CROP-seq**), which could help to prioritize the gene perturbation effect in a cellular heterogeneity level.
 * **MASCOT** is an integrated pipeline for model-based analysis of single cell CRISPR knockout screening data. **MASCOT** consists of three steps: **data preprocessing**, **model building** and **perturbation effect prioritizing**: 
-    * **Data preprocessing**: Besides the conventional quality control and data normalization applied in single-cell RNA-seq analysis, **MASCOT** addresses two specific considerations that should be taken into account for such a novel data type: **(1)** Filtering perturbed cells with invalid edit; and **(2)** Filtering perturbation according to a minimal number of cells per perturbation.
+    * **Data preprocessing**: Besides the conventional quality control and data normalization applied in single-cell RNA-seq analysis, **MASCOT** addresses two specific considerations that should be taken into account for such a novel data type: **(1)** Filtering perturbed cells with invalid edits; and **(2)** Filtering perturbations according to a minimal number of cells per perturbation.
     * **Model building**: **MASCOT** builds an analytical model based on Topic Models to handle single-cell CRISPR screening data. The concept of topic models was initially presented in machine learning community and has been successfully applied to gene expression data analysis. A key feature of topic model is that it allows each perturbed sample to process a proportion of membership in each functional topic rather than to categorize the sample into a discrete cluster. Such a topic profile, which is derived from large-scale cell-to-cell different perturbed samples, allows for a quantitative description of the biologic function of cells under specific gene perturbation conditions. **MASCOT** addresses two specific issues when applying the topic model to this specific data type: **(1)** The distribution of topics between cases and controls is affected by the ratio of their sample numbers, and such a sample imbalance issue is addressed by the bootstrapping strategy when prioritizing the perturbation effect. **(2)** The optimal topic number is automatically selected by MASCOT in a data-driven manner.
     * **Perturbation effect prioritizing**: Based on the model-based perturbation analysis, **MASCOT** can quantitatively estimate and prioritize the individual gene perturbation effect on cell phenotypes from three different perspectives, i.e., prioritizing the gene perturbation effect as an overall perturbation effect, or in a functional topic-specific way and quantifying the relationships between different perturbations. 
-* **Input File Format**. For running **MASCOT**, the input data needed to follow the standard format we defined. For convenience, **MASCOT** accepts two kinds of input data formats: **(1)** The first data format can be referred in the **data_format_example/crop_unstimulated.RData** we provided. It is an example dataset containing "expression_profile", "perturb_information" and "sgRNA_information". You can apply function "Input_preprocess()" to handle this data format; **(2)** The second data format can be referred in the **data_format_example/perturb_GSM2396857/** generated by 10X genomics. The directory **data_format_example/perturb_GSM2396857** contains "barcodes.tsv", "genes.tsv", "matrix.mtx", "cbc_gbc_dict.tsv" and "cbc_gbc_dict_grna.tsv". You can apply function "Input_preprocess_10X()" to handle this data format. 
+* **Input File Format**. For running **MASCOT**, the input data needed to follow the standard format we defined. For convenience, **MASCOT** accepts two kinds of input data formats: **(1)** The first data format can be referred in the **data_format_example/crop_unstimulated.RData** as we provided. It is an example dataset containing "expression_profile", "perturb_information" and "sgRNA_information". You can apply function "Input_preprocess()" to handle this data format; **(2)** The second data format can be referred in the **data_format_example/perturb_GSM2396857/** generated by 10X genomics. The directory **data_format_example/perturb_GSM2396857** contains "barcodes.tsv", "genes.tsv", "matrix.mtx", "cbc_gbc_dict.tsv" and "cbc_gbc_dict_grna.tsv". You can apply function "Input_preprocess_10X()" to handle this data format. 
 * **Attention:** The label of the control sample needs to be "CTRL".
-* For illustration purpose, we took the least dataset **data_format_example/crop_unstimulated.RData** as an example.
+* For illustration purpose, we took the dataset **data_format_example/crop_unstimulated.RData** as an example.
     * Install: You can install the **MASCOT** package from Github using **devtools** packages with R>=3.4.1. For convenience, you can also install the **MASCOT** package from Docker Hub with the link [mascot](https://hub.docker.com/r/bm2lab/mascot/)
     ```r
     library(Biostrings)
@@ -122,9 +122,9 @@
 
     ```r
 
-    # calculate the overall perturbation effect ranking list without "offTarget_Info" calculated.
+    # calculate the overall perturbation effect ranking list without "offTarget_Info".
     rank_overall_result<-Rank_overall(distri_Diff)
-    #rank_overall_result<-Rank_overall(distri_Diff,offTarget_hash=offTarget_Info) (if "offTarget_Info" was calculated. For "offTarget_info", you can see the introduction in the end).
+    #rank_overall_result<-Rank_overall(distri_Diff,offTarget_hash=offTarget_Info) (when "offTarget_Info" was calculated. For detailed information "offTarget_info", please refer to the introduction part in the end).
 
     # calculate the topic-specific ranking list.
     rank_topic_specific_result<-Rank_specific(distri_Diff)
@@ -134,7 +134,7 @@
     ```
     ![](figure/perturbation_network.png)
 
-    * If sgRNA sequence of each knockouts were known and you want to consider if they have off-targets, you can perform this step.  This step won't affect the final ranking result, but present the off-target information. In most cases, the sgRNA in such experiment has no off-targets. **If you do not want to consider this factor, then just skip this step**. 
+    * If sgRNA sequence of each knockouts were known and you want to investigate if they have off-targets, you can perform this step.  This step won't affect the final ranking result, but just report the off-target information. In most cases, the sgRNA in such experiment has no off-targets. **If you do not want to consider this factor, then just skip this step**. 
     ```r
     #library(CRISPRseek)
     #library("BSgenome.Hsapiens.UCSC.hg38")