This repository contains raw data and analysis notebook for manuscript (in progress):
"Single cell mRNA signals reveal a distinct developmental state of KMT2A-rearranged infant B-lymphoblastic leukaemia"
"data" folder contains following:
single_cell_data (10X single cell leukemia data):
- raw_counts = output of cellranger, i.e content of filtered_gene_bc_matrices (ALL_AML_1) or filtered_feature_bc_matrix (ALL_AML_2 and ALL_AML_3)
- SoupX_processed_counts = "decontaminated" raw data, i.e. after running SoupX
- P1/P3/P5_iALL_clusters.tsv = output file from souporcell software for multiplexed samples.
- final_h5ad = h5ad object for each sample with raw counts in adata.X slot and sample ID and cancer/normal cell classification in adata.obs.patient_cancer
bulk_data (bulk data run through Salmon):
- StJude = matrix of counts with gene length in second column (stJudes_bulk_txiConvGlen.csv) together with metadata
- TARGET = matrix of counts (TARGET_raw_countsALLwGlen.csv) together with metadata
"analysis_results" folder contains following:
deconvolution = results of running of bulk rna-seq deconvolution (cellular signal analysis,, includes StJudes*, TARGET* and InHouse* (lineage switch case, as on Figure 3).
logreg = results fo logistic regression (single cell cancer to normal comparison, and includes
(logistic regression model trained on the fetal bone marrow reference) andInfALL_allSAMPLES_LogregFBM.csv
(probability of similarity to each of the normal cell type per leukemia single cell)
Jupiter notebook shows the preprocessing steps (demultiplexing, normalising) and filtering (mitochondrial content < 20%), as well as code to reproduce Figure 1 (deconvolution results) and Figure 2 (UMAPs, result of logistic regression and siluette plot)