Skip to content

Error: X needs to be 2-dimensional, not 1-dimensional for bam file with more than 3000 cells #106

@OwenHoare1989

Description

@OwenHoare1989

Running_scTE_pipeline_output_error.txt

Hello @jphe @l1y1y @oaxiom @carmarpe,

I was wondering if any of you could help me.

I am running the scTE pipeline and was successful one time in obtaining the python output object from the pipeline which I was able to successfully convert to a Seurat object. The first time I ran the pipeline I was looking at genes and TE family IDs only which has approximately 1167 of these family IDs. However, now I am re-running the scTE pipeline looking at all TE fragments across all chromosomes which has more than 3 million. I built my own costume made index for the human hg38 genome but now I get the following error.

File "/data/users/ohoare/Analysis_space/Human_RT_SMARCB1_deficient/scRNA_seq_TE_Analysis/CondaEnvscRNA_seq/lib/python3.9/site-packages/anndata-0.10.7-py3.9.egg/anndata/_core/anndata.py", line 107, in _check_2d_shape
raise ValueError(
ValueError: X needs to be 2-dimensional, not 1-dimensional

I have attached the log file with the error. I have tried to do this on a single cell experiment with more than 3000 cells and another one with 5000 cells so the bam file is not too small and when I tried using hdf5 False it just gave me an empty .csv file.

Here are the steps I used below. I run Cellranger version 7.2 with modified parameters and custom made human reference genome (TEs and genes) plus I allowed for multi-mapping parameters. I then took the output .bam file from cell ranger and removed redundant cell barcodes. Here is the code I used.

# Find multi-mapped reads that map to more than one loci using the NH tag (which gives the number of loci the read can map to), and also a MAPQ score not equal to 255.
samtools view -h possorted_genome_bam.bam | grep -E "^\@|NH:i:2" | awk 'BEGIN{FS="\t"} $5!=255' > multi_mapped_reads.sam

# Generate a multi-mapped BAM file
samtools view -S -b multi_mapped_reads.sam -o multi_mapped_reads.bam

# Filter the reads with no barcodes
samtools view possorted_genome_bam.bam -h | awk '/^@/ || /CB:/' | samtools view -h -b > possorted_genome_bam.clean.bam

Next I built a custom made index to generate the scTE output files with TEs and genes counted and allowed for multi-mapping.

# How to run the single cell transposable elements pipeline to identify genes and TEs

**# Step 1: Build the reference custome made index with human genes and TEs by running the below shell script

after the scTE pipline has been installed correctly with samtools ect.**

`scTE_build -te gtf_filtered_RMSK_modified.bed -gene gtf_filtered_HAVANA_ENSEMBL.gtf -o custome_all_TEs

# Step 2: Run the scTE pipeline with the below shell script to generate output.
scTE -i /data/users/ohoare/Analysis_space/Human_RT_SMARCB1_deficient/Peripheral_nerve/Output/InnovRT-001_1/outs/possorted_genome_bam.clean.bam -o /data/users/ohoare/Analysis_space/Human_RT_SMARCB1_deficient/Peripheral_nerve/Output/InnovRT-001_1/outs/InnovRT_001_1_scTE_output -x /data/users/ohoare/Analysis_space/Human_RT_SMARCB1_deficient/Peripheral_nerve/Input_Data/custome_all_TEs.exclusive.idx --hdf5 True -CB CB -UMI UB
`

Could the Error: X needs to be 2-dimensional, not 1-dimensional be coming from my custom made index even though it worked well when only doing it at the family level ID. I used exactly the same method the second time with a much bigger index.

I am using python 3.9 which is visible in my log error file I have attached. I think I have used the correct parameters.

Do you have any suggestions or see something I missed. Please feel free to ask any other questions if something is not clear/

Kind regards
Owen

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions