Skip to content

This repository is for ER+ breast cancer scRNA-seq data processing and figure generation.

Notifications You must be signed in to change notification settings

hyunsoo77/BC_tamoxifen_response

Repository files navigation

BC_tamoxifen_response

This repository is for ER+ breast cancer scRNA-seq data processing from 10x Genomics scRNA-seq FASTQ files and generation of figures.

Installing

git clone https://github.com/hyunsoo77/BC_tamoxifen_response.git

scRNA-seq data processing

Step 1: align sequences in scRNA-seq FASTQ files to GRCh38 reference transcriptome by 10x Genomics cellranger count to obtain two filtered_feature_bc_matrix.h5 files for two samples.

Step 2: Make the following directoy structure with copy or link.

../count_er+bc-pairs
├── Tumor5
│   ├── outs
│   │   └── filtered_feature_bc_matrix.h5
└── Tumor5_TAM
    └── outs
        └── filtered_feature_bc_matrix.h5

Step 3: Make Seurat object for each sample with the following command:

./make_sc-rna-seq_seurat_obj.R --dir_count ../count_er+bc-pairs --dir_output ./output_er+bc-pairs --dir_seurat_obj ./output_er+bc-pairs/rds_er+bc-pairs --type_qc arguments --min_ncount_rna 5000 --min_nfeature_rna 2000 --th_percent.mt 25 --max_dimstouse 30 --seurat_resolution 0.8 --method_to_update_cell_types epithelial_cell_types --method_to_identify_subtypes none --type_infercnv_argset vignettes --infercnv_pos_notpos er+bc-pairs Tumor5

The above example is only for Tumor5, you can make another Seurat object for Tumor5_TAM by changing the last argument. The contents of the output directory of "./output_er+bc-pairs" follows:

output_er+bc-pairs/
├── infercnv
│   ├── er+bc-pairs_Tumor5_cnv_postdoublet
│   └── er+bc-pairs_Tumor5_TAM_cnv_postdoublet
├── log
├── rds_er+bc-pairs
│   ├── er+bc-pairs_Tumor5_sc-rna-seq_sample_seurat_obj.rds
│   ├── er+bc-pairs_Tumor5_TAM_sc-rna-seq_sample_seurat_obj.rds
│   └── wilcox_degs
├── tsv
│   ├── infercnv_input_barcode_group_er+bc-pairs_Tumor5.tsv
│   └── infercnv_input_barcode_group_er+bc-pairs_Tumor5_TAM.tsv
└── xlsx
    ├── er+bc-pairs_Tumor5_sc-rna-seq_pipeline_summary.xlsx
    └── er+bc-pairs_Tumor5_TAM_sc-rna-seq_pipeline_summary.xlsx

Step 4: Merge Seurat objects for multiple samples to make merged Seurat object by the following command:

./make_sc-rna-seq_merged_seurat_obj.R --dir_output ./output_er+bc-pairs --dir_seurat_obj ./output_er+bc-pairs/rds_er+bc-pairs --k.anchor 5 --max_dimstouse 30 --seurat_resolution 0.8 --cancer_type_for_parsing_rds_filename er+bc-pairs --type_parsing_rds_filename_for_donor 2nd_item_after_parsing_with_underbar --harmony_theta 0  er+bc-pairs

The output file is located under ./output_er+bc-pairs/rds_er+bc-pairs that was defined by an argument of --dir_seurat_obj.

output_er+bc-pairs/
│   ...
├── rds_er+bc-pairs
│   ├── er+bc-pairs_Tumor5_sc-rna-seq_sample_seurat_obj.rds
│   ├── er+bc-pairs_Tumor5_TAM_sc-rna-seq_sample_seurat_obj.rds
│   ├── er+bc-pairs_sc-rna-seq_merged_seurat_obj.rds
│   └── wilcox_degs
...

Jupyter notebook

Figures were generated by Jupyter notebook scripts. In order to install Jupyter notebook/lab, see jupyter.org. You need to change dir_rna and/or dir_atac to locate the merged Seurat object or final ArchRProject object you generated. The output files include PDF files that will be located at the directory of "pdf".

./
├── figure1_01_umap.ipynb
├── figure1_02_barplot.ipynb
├── figure2_01_umap.ipynb
├── figure2_02_dge.ipynb
├── figure3_01_umap.ipynb
├── figure3_02_boxplot.ipynb
├── figure3_03_dge_pairs.ipynb
├── figure4_01_umap.ipynb
├── figure4_02_dge.ipynb
├── figure5_01_umap.ipynb
├── figure5_02_barplot.ipynb
├── figure5_03_drug_effect.ipynb
├── figure_s1_01_dge.ipynb
├── figure_s2_01_barplot.ipynb
├── figure_s2_02_dge.ipynb
├── log
├── pdf
│   ├── ...
│   ├── barplot_er+bc-pairs_cluster_type_prop_rna.pdf
│   ├── ...
│   ├── heatmap_er+bc-pairs_control_vs_tamoxifen_Tumor_cells_zscore.pdf
│   ├── ...
│   ├── umap_er+bc-pairs_cluster_labels_rna.pdf
│   ├── umap_er+bc-pairs_cluster_types_rna.pdf
│   ├── umap_er+bc-pairs_log2fc_t47d_down_genes_rna.pdf
│   └── umap_er+bc-pairs_samples_rna.pdf
├── r
├── reference
├── txt
│   └── sessionInfo.txt
└── xlsx
    ├── ...
    └── er+bc-pairs_control_vs_tamoxifen_Tumor_cells.xlsx

Let's check the cell numbers for each cell type.

The scRNA-seq pipeline is actively developed. Other single cell data analysis projects will use the current version with different parameters or upgraded version of these pipelines.

About

This repository is for ER+ breast cancer scRNA-seq data processing and figure generation.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published