You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please follow the [vignette](https://www.choirclustering.com/articles/CHOIR.html). Alternately, install the package with `build_vignettes = TRUE`, as follows:
This vignette provides a basic example of how to run CHOIR, a clustering algorithm for single-cell sequencing data. CHOIR is applicable to single-cell sequencing data of any modality, including RNA, ATAC, and proteomics. It is also applicable to multi-modal data (see [Advanced Options](https://www.choirclustering.com/articles/CHOIR.html#advanced-options)).
28
+
This vignette provides a basic example of how to run CHOIR, a clustering algorithm for single-cell sequencing data. CHOIR is applicable to single-cell sequencing data of any modality, including RNA, ATAC, and proteomics. It is also applicable to multi-modal data (see [Advanced Options](https://www.choirclustering.com/articles/CHOIR.html#advanced-options)). Detailed parameter definitions are available under the [Functions](https://www.choirclustering.com/reference/index.html) tab.
29
29
30
30
CHOIR is based on the premise that if clusters contain biologically different cell types or states, a machine learning classifier that considers features present in cells from each cluster should be able to distinguish the clusters with a higher level of accuracy than machine learning classifiers trained on randomly permuted cluster labels. The use of permutation testing approaches allows CHOIR to introduce statistical significance thresholds into the clustering process.
31
31
@@ -79,7 +79,9 @@ The two steps can be run together using the function `CHOIR()` or separately usi
79
79
80
80
The `CHOIR()` function will run all of the steps of the CHOIR algorithm in sequence. CHOIR is highly parallelized, so efficiency greatly improves as `n_cores` is increased.
81
81
82
-
The default significance level used by CHOIR is $\alpha = 0.05$ with Bonferroni multiple comparison correction. Other correction methods may be less conservative, as CHOIR applies filters that reduce the total number of tests performed (see [Advanced Options](https://www.choirclustering.com/articles/CHOIR.html#advanced-options)).
82
+
The default significance level used by CHOIR is $\alpha = 0.05$ with Bonferroni multiple comparison correction. Other correction methods may be less conservative, as CHOIR applies filters that reduce the total number of tests performed (see [Advanced Options](https://www.choirclustering.com/articles/CHOIR.html#advanced-options)).
83
+
84
+
We recommend using the default value of $\alpha = 0.05$ with Bonferroni multiple comparison correction. For a more conservative approach, the `alpha` value could be decreased to 0.01 or 0.001.
83
85
84
86
```{r, eval = FALSE}
85
87
seurat_object <- CHOIR(seurat_object,
@@ -109,7 +111,7 @@ After constructing the hierarchical clustering tree, CHOIR iterates through each
109
111
110
112
In parallel, CHOIR shuffles the cluster labels and repeats the same process. Both comparisons are repeated using bootstrapped samples (default = 100 iterations), resulting in a permutation test that compares the true prediction accuracy for the clusters to the prediction accuracy for a chance division of the cells into two random groups.
111
113
112
-
This permutation test yields a p-value that determines whether these clusters are slated to merge or remain separate. The significance threshold used can be adjusted using the `alpha` parameter.
114
+
This permutation test yields a p-value that determines whether these clusters are slated to merge or remain separate. The significance threshold used can be adjusted using the `alpha` parameter. We recommend using the default value of $\alpha = 0.05$ with Bonferroni multiple comparison correction. For a more conservative approach, the `alpha` value could be decreased to 0.01 or 0.001.
@@ -159,29 +161,37 @@ The default dimensionality reduction method for Seurat objects is 'PCA', except
159
161
160
162
If you would like to use SCTransform normalization rather than log normalization, please provide raw counts and set the parameter `normalization_method` to 'SCTransform'. Note that SCTransform has not been thoroughly tested with CHOIR.
161
163
164
+
Labels for the final clusters identified by CHOIR can be found in the `meta.data` slot of the Seurat object. Other CHOIR outputs are stored under the `misc` slot of the Seurat object.
165
+
162
166
### SingleCellExperiment
163
167
164
168
For SingleCellExperiment objects, only the `use_assay` parameter is needed. If not provided, it is set to 'logcounts'.
165
169
166
-
The default dimensionality reduction method for Seurat objects is 'PCA', except in the case of ATAC-seq data, where it is 'LSI'.
170
+
The default dimensionality reduction method for SingleCellExperiment objects is 'PCA', except in the case of ATAC-seq data, where it is 'LSI'.
171
+
172
+
Labels for the final clusters identified by CHOIR can be found in the `colData` slot of the SingleCellExperiment object. Other CHOIR outputs are stored under `metadata`.
167
173
168
174
### ArchR
169
175
170
176
For ArchR objects, if no input is provided for parameter `ArchR_matrix`, the "TileMatrix" is used. If no input for parameter `ArchR_depthcol` is provided, "nFrags" is used.
171
177
172
178
The default dimensionality reduction method for ArchR objects is 'IterativeLSI'.
173
179
180
+
Labels for the final clusters identified by CHOIR can be found in the `cellColData` slot of the ArchR object. Other CHOIR outputs are stored under `projectMetadata`.
181
+
174
182
## CHOIR parameters
175
183
176
184
### Batch correction
177
185
178
-
For datasets with multiple batches, it is recommended to apply Harmony batch correction through CHOIR by setting the parameter `batch_correction_method` to 'Harmony'. This not only generates Harmony-corrected dimnesionality reductions, but ensures that random forest classifer comparisons are batch-aware.
186
+
For datasets with multiple batches, it is recommended to apply Harmony batch correction through CHOIR by setting the parameter `batch_correction_method` to 'Harmony'. This not only generates Harmony-corrected dimensionality reductions, but ensures that random forest classifer comparisons are batch-aware.
179
187
180
188
Use caution in applying this method if your groups of interest (e.g., disease vs. control) are batch-confounded AND you expect cell types unique to each of these groups.
The default significance level used by CHOIR is $\alpha = 0.05$ with Bonferroni multiple comparison correction. Other correction methods may be less conservative, as `CHOIR` applies filters that reduce the total number of tests performed (see below).
192
+
The default significance level used by CHOIR is $\alpha = 0.05$ with Bonferroni multiple comparison correction. Other correction methods may be less conservative, as `CHOIR` applies filters that reduce the total number of tests performed (see below).
193
+
194
+
We recommend using the default value of $\alpha = 0.05$ with Bonferroni multiple comparison correction. For a more conservative approach, the `alpha` value could be decreased to 0.01 or 0.001.
185
195
186
196
### Filters
187
197
@@ -194,9 +204,9 @@ CHOIR uses various filters to reduce the number of necessary permutation test co
194
204
195
205
### Downsampling
196
206
197
-
CHOIR uses downsampling to increase efficiency for larger datasets. Datasets above 5000 cells are automatically downsampled according to their size. Downsampling occurs at each random forest classifer comparison, using the default parameter setting `downsampling_rate = "auto"`.
207
+
CHOIR uses downsampling to increase efficiency for larger datasets. Using the default parameter setting of `downsampling_rate = "auto"`, downsampling occurs at each random forest classifer comparison. The downsampling rate is determined based on the overall dataset size. To disable downsampling, set `downsampling_rate = 1`.
198
208
199
-
Additional downsampling can be imposed using parameter `sample_max`, indicating the maximum number of cells used per cluster to train/test each random forest classifier. The default value does not cap the number of cells used.
209
+
Additional downsampling can be imposed using parameter `sample_max`, indicating the maximum number of cells used per cluster to train/test each random forest classifier. By default, this is not used.
200
210
201
211
## Providing pre-generated clusters
202
212
@@ -205,7 +215,7 @@ For users who already have a set of clusters generated by a different tool, and
205
215
To `pruneTree()`, provide:
206
216
207
217
*`object` The input object under which the results will be stored.
208
-
*`cluster_tree` A dataframe containing the cluster IDs of each cell across the levels of a hierarchical clustering tree. This can be generated from a single level of clusters using function `createHierachy()` (IN DEVELOPMENT).
218
+
*`cluster_tree` A dataframe containing the cluster IDs of each cell across the levels of a hierarchical clustering tree.
209
219
*`input_matrix` A matrix containing the feature x cell data on which to train the random forest classifiers.
210
220
*`nn_matrix` A matrix containing the nearest neighbor adjacency of the cells.
211
221
* Either reduction (a matrix of dimensionality reduction cell embeddings) if using approximate distances OR `dist_matrix` (a distance matrix of cell to cell distances)
0 commit comments