diff --git a/README.md b/README.md index e03f390..c660bbf 100644 --- a/README.md +++ b/README.md @@ -27,17 +27,18 @@ https://drive.weixin.qq.com/s?k=AJEAIQdfAAoUxhXE7r The test single-cell transcriptomics data file should be pre-processed by first revising gene symbols according to [NCBI Gene database](https://www.ncbi.nlm.nih.gov/gene) updated on Jan. 10, 2020, wherein unmatched genes and duplicated genes will be removed. Then the data should be normalized with the `sc.pp.normalize_total` and `sc.pp.log1p` method in `scanpy` (Python package), detailed in `preprocess.py`. You can download this repo and run the demo task on your computing machine within about 4 hours. - +It expects the gene2vec embedding `gene2vec_16906.npy` in a `data` folder parallel to the `scBERT` repository (e.g., `../data/gene2vec_16906.npy` if your current working directory is ``scBERT`). + - Fine-tune using pre-trained models ``` -python -m torch.distributed.launch --data_path "fine-tune_data_path" --model_path "pretrained_model_path" finetune.py +python -m torch.distributed.launch finetune.py --data_path "fine-tune_data_path" --model_path "pretrained_model_path" #The cell type information is stored in 'label' and 'label_dict' files. ``` - Predict using fine-tuned models ``` -python --data_path "test_data_path" --model_path "finetuned_model_path" predict.py +python predict.py --data_path "test_data_path" --model_path "finetuned_model_path" #The cell type information will be loaded frome 'label' and 'label_dict' files. ``` @@ -46,7 +47,7 @@ python --data_path "test_data_path" --model_path "finetuned_model_path" predict. The detection of novel cell type can be done by thresholding the predicted probabilities. (Default threshold=0.5) ``` -python --data_path "test_data_path" --model_path "finetuned_model_path" --novel_type True --unassign_thres "custom_threshold" predict.py +python predict.py --data_path "test_data_path" --model_path "finetuned_model_path" --novel_type True --unassign_thres "custom_threshold ``` - Expected output @@ -91,4 +92,4 @@ The copyright holder for this project is Tencent AI Lab. All rights reserved. # Citation -Yang, F., Wang, W., Wang, F. et al. scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. Nat Mach Intell (2022). https://doi.org/10.1038/s42256-022-00534-z \ No newline at end of file +Yang, F., Wang, W., Wang, F. et al. scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. Nat Mach Intell (2022). https://doi.org/10.1038/s42256-022-00534-z diff --git a/requirements.txt b/requirements.txt index cc38e7d..a9db0c3 100644 --- a/requirements.txt +++ b/requirements.txt @@ -4,5 +4,7 @@ transformers==4.6.1 scanpy==1.7.2 scikit-learn==0.24.2 scipy==1.5.4 -numpy==1.19.2 +numpy==1.20 pandas==1.1.5 +einops==0.6.0 +matplotlib<3.7