-
Notifications
You must be signed in to change notification settings - Fork 31
Error: X needs to be 2-dimensional, not 1-dimensional for bam file with more than 3000 cells #106
Description
Running_scTE_pipeline_output_error.txt
Hello @jphe @l1y1y @oaxiom @carmarpe,
I was wondering if any of you could help me.
I am running the scTE pipeline and was successful one time in obtaining the python output object from the pipeline which I was able to successfully convert to a Seurat object. The first time I ran the pipeline I was looking at genes and TE family IDs only which has approximately 1167 of these family IDs. However, now I am re-running the scTE pipeline looking at all TE fragments across all chromosomes which has more than 3 million. I built my own costume made index for the human hg38 genome but now I get the following error.
File "/data/users/ohoare/Analysis_space/Human_RT_SMARCB1_deficient/scRNA_seq_TE_Analysis/CondaEnvscRNA_seq/lib/python3.9/site-packages/anndata-0.10.7-py3.9.egg/anndata/_core/anndata.py", line 107, in _check_2d_shape
raise ValueError(
ValueError: X needs to be 2-dimensional, not 1-dimensional
I have attached the log file with the error. I have tried to do this on a single cell experiment with more than 3000 cells and another one with 5000 cells so the bam file is not too small and when I tried using hdf5 False it just gave me an empty .csv file.
Here are the steps I used below. I run Cellranger version 7.2 with modified parameters and custom made human reference genome (TEs and genes) plus I allowed for multi-mapping parameters. I then took the output .bam file from cell ranger and removed redundant cell barcodes. Here is the code I used.
# Find multi-mapped reads that map to more than one loci using the NH tag (which gives the number of loci the read can map to), and also a MAPQ score not equal to 255.
samtools view -h possorted_genome_bam.bam | grep -E "^\@|NH:i:2" | awk 'BEGIN{FS="\t"} $5!=255' > multi_mapped_reads.sam
# Generate a multi-mapped BAM file
samtools view -S -b multi_mapped_reads.sam -o multi_mapped_reads.bam
# Filter the reads with no barcodes
samtools view possorted_genome_bam.bam -h | awk '/^@/ || /CB:/' | samtools view -h -b > possorted_genome_bam.clean.bam
Next I built a custom made index to generate the scTE output files with TEs and genes counted and allowed for multi-mapping.
# How to run the single cell transposable elements pipeline to identify genes and TEs
**# Step 1: Build the reference custome made index with human genes and TEs by running the below shell script
after the scTE pipline has been installed correctly with samtools ect.**
`scTE_build -te gtf_filtered_RMSK_modified.bed -gene gtf_filtered_HAVANA_ENSEMBL.gtf -o custome_all_TEs
# Step 2: Run the scTE pipeline with the below shell script to generate output.
scTE -i /data/users/ohoare/Analysis_space/Human_RT_SMARCB1_deficient/Peripheral_nerve/Output/InnovRT-001_1/outs/possorted_genome_bam.clean.bam -o /data/users/ohoare/Analysis_space/Human_RT_SMARCB1_deficient/Peripheral_nerve/Output/InnovRT-001_1/outs/InnovRT_001_1_scTE_output -x /data/users/ohoare/Analysis_space/Human_RT_SMARCB1_deficient/Peripheral_nerve/Input_Data/custome_all_TEs.exclusive.idx --hdf5 True -CB CB -UMI UB
`
Could the Error: X needs to be 2-dimensional, not 1-dimensional be coming from my custom made index even though it worked well when only doing it at the family level ID. I used exactly the same method the second time with a much bigger index.
I am using python 3.9 which is visible in my log error file I have attached. I think I have used the correct parameters.
Do you have any suggestions or see something I missed. Please feel free to ask any other questions if something is not clear/
Kind regards
Owen