Skip to content

Commit 36b0ea2

Browse files
committed
Update vignette text
1 parent 32b76aa commit 36b0ea2

File tree

3 files changed

+21
-17
lines changed

3 files changed

+21
-17
lines changed

.gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
11
.Rproj.user
22
.DS_Store
33
pkgdown/
4+
.Rhistory

README.md

+1-8
Original file line numberDiff line numberDiff line change
@@ -17,15 +17,8 @@ remotes::install_github("corceslab/CHOIR", ref="main", repos = BiocManager::repo
1717

1818
## Usage
1919

20-
Please follow the [vignette](https://www.choirclustering.com/articles/CHOIR.html). Alternately, install the package with `build_vignettes = TRUE`, as follows:
21-
``` r
22-
remotes::install_github("corceslab/CHOIR", ref="main", repos = BiocManager::repositories(), upgrade = "never", build_vignettes = TRUE)
23-
```
20+
Please follow the [vignette](https://www.choirclustering.com/articles/CHOIR.html).
2421

25-
And access the vignette by running:
26-
``` r
27-
vignette("CHOIR")
28-
```
2922
<hr>
3023

3124
<p align="left"><a href ="https://www.corceslab.com/"><img src="man/figures/CorcesLab_logo.png" alt="" width="300"></a></p>

vignettes/CHOIR.Rmd

+19-9
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ remotes::install_github("corceslab/CHOIR", ref="main", repos = BiocManager::repo
2525

2626
# Introduction
2727

28-
This vignette provides a basic example of how to run CHOIR, a clustering algorithm for single-cell sequencing data. CHOIR is applicable to single-cell sequencing data of any modality, including RNA, ATAC, and proteomics. It is also applicable to multi-modal data (see [Advanced Options](https://www.choirclustering.com/articles/CHOIR.html#advanced-options)).
28+
This vignette provides a basic example of how to run CHOIR, a clustering algorithm for single-cell sequencing data. CHOIR is applicable to single-cell sequencing data of any modality, including RNA, ATAC, and proteomics. It is also applicable to multi-modal data (see [Advanced Options](https://www.choirclustering.com/articles/CHOIR.html#advanced-options)). Detailed parameter definitions are available under the [Functions](https://www.choirclustering.com/reference/index.html) tab.
2929

3030
CHOIR is based on the premise that if clusters contain biologically different cell types or states, a machine learning classifier that considers features present in cells from each cluster should be able to distinguish the clusters with a higher level of accuracy than machine learning classifiers trained on randomly permuted cluster labels. The use of permutation testing approaches allows CHOIR to introduce statistical significance thresholds into the clustering process.
3131

@@ -79,7 +79,9 @@ The two steps can be run together using the function `CHOIR()` or separately usi
7979

8080
The `CHOIR()` function will run all of the steps of the CHOIR algorithm in sequence. CHOIR is highly parallelized, so efficiency greatly improves as `n_cores` is increased.
8181

82-
The default significance level used by CHOIR is $\alpha = 0.05$ with Bonferroni multiple comparison correction. Other correction methods may be less conservative, as CHOIR applies filters that reduce the total number of tests performed (see [Advanced Options](https://www.choirclustering.com/articles/CHOIR.html#advanced-options)).
82+
The default significance level used by CHOIR is $\alpha = 0.05$ with Bonferroni multiple comparison correction. Other correction methods may be less conservative, as CHOIR applies filters that reduce the total number of tests performed (see [Advanced Options](https://www.choirclustering.com/articles/CHOIR.html#advanced-options)).
83+
84+
We recommend using the default value of $\alpha = 0.05$ with Bonferroni multiple comparison correction. For a more conservative approach, the `alpha` value could be decreased to 0.01 or 0.001.
8385

8486
```{r, eval = FALSE}
8587
seurat_object <- CHOIR(seurat_object,
@@ -109,7 +111,7 @@ After constructing the hierarchical clustering tree, CHOIR iterates through each
109111

110112
In parallel, CHOIR shuffles the cluster labels and repeats the same process. Both comparisons are repeated using bootstrapped samples (default = 100 iterations), resulting in a permutation test that compares the true prediction accuracy for the clusters to the prediction accuracy for a chance division of the cells into two random groups.
111113

112-
This permutation test yields a p-value that determines whether these clusters are slated to merge or remain separate. The significance threshold used can be adjusted using the `alpha` parameter.
114+
This permutation test yields a p-value that determines whether these clusters are slated to merge or remain separate. The significance threshold used can be adjusted using the `alpha` parameter. We recommend using the default value of $\alpha = 0.05$ with Bonferroni multiple comparison correction. For a more conservative approach, the `alpha` value could be decreased to 0.01 or 0.001.
113115

114116
```{r, message = TRUE, warning = FALSE, results = "hide"}
115117
seurat_object <- pruneTree(seurat_object,
@@ -159,29 +161,37 @@ The default dimensionality reduction method for Seurat objects is 'PCA', except
159161

160162
If you would like to use SCTransform normalization rather than log normalization, please provide raw counts and set the parameter `normalization_method` to 'SCTransform'. Note that SCTransform has not been thoroughly tested with CHOIR.
161163

164+
Labels for the final clusters identified by CHOIR can be found in the `meta.data` slot of the Seurat object. Other CHOIR outputs are stored under the `misc` slot of the Seurat object.
165+
162166
### SingleCellExperiment
163167

164168
For SingleCellExperiment objects, only the `use_assay` parameter is needed. If not provided, it is set to 'logcounts'.
165169

166-
The default dimensionality reduction method for Seurat objects is 'PCA', except in the case of ATAC-seq data, where it is 'LSI'.
170+
The default dimensionality reduction method for SingleCellExperiment objects is 'PCA', except in the case of ATAC-seq data, where it is 'LSI'.
171+
172+
Labels for the final clusters identified by CHOIR can be found in the `colData` slot of the SingleCellExperiment object. Other CHOIR outputs are stored under `metadata`.
167173

168174
### ArchR
169175

170176
For ArchR objects, if no input is provided for parameter `ArchR_matrix`, the "TileMatrix" is used. If no input for parameter `ArchR_depthcol` is provided, "nFrags" is used.
171177

172178
The default dimensionality reduction method for ArchR objects is 'IterativeLSI'.
173179

180+
Labels for the final clusters identified by CHOIR can be found in the `cellColData` slot of the ArchR object. Other CHOIR outputs are stored under `projectMetadata`.
181+
174182
## CHOIR parameters
175183

176184
### Batch correction
177185

178-
For datasets with multiple batches, it is recommended to apply Harmony batch correction through CHOIR by setting the parameter `batch_correction_method` to 'Harmony'. This not only generates Harmony-corrected dimnesionality reductions, but ensures that random forest classifer comparisons are batch-aware.
186+
For datasets with multiple batches, it is recommended to apply Harmony batch correction through CHOIR by setting the parameter `batch_correction_method` to 'Harmony'. This not only generates Harmony-corrected dimensionality reductions, but ensures that random forest classifer comparisons are batch-aware.
179187

180188
Use caution in applying this method if your groups of interest (e.g., disease vs. control) are batch-confounded AND you expect cell types unique to each of these groups.
181189

182190
### Significance level & multiple comparison correction
183191

184-
The default significance level used by CHOIR is $\alpha = 0.05$ with Bonferroni multiple comparison correction. Other correction methods may be less conservative, as `CHOIR` applies filters that reduce the total number of tests performed (see below).
192+
The default significance level used by CHOIR is $\alpha = 0.05$ with Bonferroni multiple comparison correction. Other correction methods may be less conservative, as `CHOIR` applies filters that reduce the total number of tests performed (see below).
193+
194+
We recommend using the default value of $\alpha = 0.05$ with Bonferroni multiple comparison correction. For a more conservative approach, the `alpha` value could be decreased to 0.01 or 0.001.
185195

186196
### Filters
187197

@@ -194,9 +204,9 @@ CHOIR uses various filters to reduce the number of necessary permutation test co
194204

195205
### Downsampling
196206

197-
CHOIR uses downsampling to increase efficiency for larger datasets. Datasets above 5000 cells are automatically downsampled according to their size. Downsampling occurs at each random forest classifer comparison, using the default parameter setting `downsampling_rate = "auto"`.
207+
CHOIR uses downsampling to increase efficiency for larger datasets. Using the default parameter setting of `downsampling_rate = "auto"`, downsampling occurs at each random forest classifer comparison. The downsampling rate is determined based on the overall dataset size. To disable downsampling, set `downsampling_rate = 1`.
198208

199-
Additional downsampling can be imposed using parameter `sample_max`, indicating the maximum number of cells used per cluster to train/test each random forest classifier. The default value does not cap the number of cells used.
209+
Additional downsampling can be imposed using parameter `sample_max`, indicating the maximum number of cells used per cluster to train/test each random forest classifier. By default, this is not used.
200210

201211
## Providing pre-generated clusters
202212

@@ -205,7 +215,7 @@ For users who already have a set of clusters generated by a different tool, and
205215
To `pruneTree()`, provide:
206216

207217
* `object` The input object under which the results will be stored.
208-
* `cluster_tree` A dataframe containing the cluster IDs of each cell across the levels of a hierarchical clustering tree. This can be generated from a single level of clusters using function `createHierachy()` (IN DEVELOPMENT).
218+
* `cluster_tree` A dataframe containing the cluster IDs of each cell across the levels of a hierarchical clustering tree.
209219
* `input_matrix` A matrix containing the feature x cell data on which to train the random forest classifiers.
210220
* `nn_matrix` A matrix containing the nearest neighbor adjacency of the cells.
211221
* Either reduction (a matrix of dimensionality reduction cell embeddings) if using approximate distances OR `dist_matrix` (a distance matrix of cell to cell distances)

0 commit comments

Comments
 (0)