Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mop_preprocess problem #59

Open
ddioken opened this issue Mar 7, 2024 · 5 comments
Open

mop_preprocess problem #59

ddioken opened this issue Mar 7, 2024 · 5 comments
Assignees

Comments

@ddioken
Copy link

ddioken commented Mar 7, 2024

Hi,

I need help with two problems I'm having with your tool on my M1 Mac. Here's what's going on:

Problem with Minimap2: I intended to use Minimap2. When i start the tool using minimap2, it gave me an error message:

[25/98f82d] Submitted process > preprocess_simple:FASTQC:fastQC (starvation_cDNA.fastq)
ERROR ~ Error executing process > 'preprocess_simple:MINIMAP2:map (e2_45min_cDNA)'

Caused by:
Process preprocess_simple:MINIMAP2:map (e2_45min_cDNA) terminated with an error exit status (1)

Command executed:

minimap2 -t 1 -a -uf -ax splice -k14 Homo_sapiens.GRCh37.cdna.fa e2_45min_cDNA.fastq | samtools view -@ 1 -F4 -hSb - > e2_45min_cDNA.bam

Command exit status:
1

Command output:
(empty)

Command error:
[main_samview] fail to read the header from "-".

It didn't tell me much, just that it couldn't read something it needed.

So, I quit and wanted to run it using bwa aligner.
Issue with BWA and Nanoplot: I switched to using BWA because Minimap2 wasn't working. With BWA, I got all my files like BAM and BAM.BAI. But then, it did not finish the run because of a problem. When I open the docker, after creating all other files (as bam, bam bai, fastqc, counts, cram etc), in the nanoplot step, it says it cannot be finished because of the error:

2024-03-07 08:20:28 Matplotlib created a temporary config/cache directory at /tmp/matplotlib-qj3mdxlk because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
2024-03-07 08:20:30 [E::idx_find_and_load] Could not retrieve index file for 'e2_45min_cDNA_s.bam'.

But I checked, and the index file is there, in the input folder itself.

I've already done the basecalling with another tool called Guppy and was just trying to use my fastq files with your tool.

Can you help me figure out what's wrong?
Thank you for making the tool.
It seems very useful. Hope I can run it!

@lucacozzuto
Copy link
Member

Hi, can you send me the log of your first RUN please?

@lucacozzuto lucacozzuto self-assigned this Mar 12, 2024
@ddioken
Copy link
Author

ddioken commented Mar 15, 2024

Hey! Thank you for the response! I attached the log.

I also tried to run minimap2 seperately and it worked.

> (base) didemdkn@Didems-MBP mop_preprocess % minimap2 -t 1 -a -uf -ax splice -k14 /Users/didemdkn/Downloads/Homo_sapiens.GRCh37.cdna.fa /Users/didemdkn/Downloads/strvs45fastq3/str/combined_str.fastq > minimap2_output.sam

[M::mm_idx_gen::7.383*0.98] collected minimizers
[M::mm_idx_gen::12.194*0.97] sorted minimizers
[M::main::12.208*0.97] loaded/built the index for 180253 target sequence(s)
[M::mm_mapopt_update::12.520*0.97] mid_occ = 142
[M::mm_idx_stat] kmer size: 14; skip: 5; is_hpc: 0; #seq: 180253
[M::mm_idx_stat::12.679*0.97] distinct minimizers: 19969680 (30.84% are singletons); average occurrences: 4.822; average spacing: 2.982; total length: 287163541
[M::worker_pipeline::454.439*1.00] mapped 427041 sequences
[M::worker_pipeline::918.263*1.00] mapped 415536 sequences
[M::worker_pipeline::1260.503*1.00] mapped 307798 sequences
[M::main] Version: 2.22-r1101
[M::main] CMD: minimap2 -t 1 -a -uf -ax splice -k14 /Users/didemdkn/Downloads/Homo_sapiens.GRCh37.cdna.fa /Users/didemdkn/Downloads/strvs45fastq3/str/combined_str.fastq
[M::main] Real time: 1260.572 sec; CPU: 1257.124 sec; Peak RSS: 2.710 GB

I did not understand why it does not work in the other case. I also checked my fastq files and it's looking alright:

> (base) didemdkn@Didems-MBP str % head combined_str.fastq
> @8d0e28c5-ec55-4df1-9121-56c64dace674 runid=95ca060123d06576dc7f2f21c526a5c30e57a51c sampleid=mcf7stv030821 read=14 ch=260 start_time=2021-08-03T11:41:48Z
> GCCAUGGCCAAGAGAGGGCCCACCAGAAACGCAGCAGCAAACGGGCCCUAGAUGGACUGGAGCAAGAAAAACGAACUCUUCAGCUCCUCUGAGGUGCCCUGCUGCACCCAGAGGUGAUGCAGGGCCGAGCCAGCAUUCCACCCCACCUUUUCCACCCCCAAUUACUCCCUGAAUCGCCGUACAAAUCAGCACCCACAUCCCCUCUUGACAAAUGAUUUCUGGAGAACAUGUUUCCUGACUUUCAGGGAAGGUGAAUGCGUGCUUCCCGUCCUCCCGCAGUCAGAAAGGAGACUCUGCCUCCCUCCCUUGAGUGCCACACCUACCGGGUGUCCCUUUGCCACCCUGCCUGGACAUCGCUGGAACCUGCACAUAUGCCAGGAUCAUGGGACCAGGCGAGAGGGCACCCUCCUCCUCCCAUGUGAUAAUAGGGUUCCAGGGCUGAUCAGAACCUGAUUGCAGAACUGCCGCUCUCGGUGAUGGGCAUACGUUAUCCUGAGACCUGUGGCAGACACGUCUUGUCUUCAUGAUUCUGUUAAGAGUGCAGUAUUAAGAGUCAUUGAGGAAAUUUGUCUCGUGAUUAACAUGAUUUCCUGGUUGUCUACACCAGGGUCGGCAGUGGCCCAGCCUUAAACUUUGUUCCUACUCCCACCCUCUCAGCGAACUGGGUCGGAUGAGGAGGGUUUGGCUACCUCCCCCUGCCCAUCCCUGAGCCAGGUACCACCAUUGUCAAGGAAACACUUUCAGAAAUCAGCUGGUUCCUCCAAAAU
> +
> ,*/,-?=:74?@>4?&LG<(,$-3=??7886';<82<4?<<:277:7%$(0),,0&3/&$&&)+?;7==5788985/&.056/3(45&%**&-$%%17=<91661',5837'$%))(%%&'*;@/+40897165)*10756>5+%+%,&('&&$()341&++63'%08,66=?+##')).8/7:,;:AC=;662154.%542,%);8@665'49;74-%2+(6A?9-)0/2(*+=A>>0,*(&'##%(),+2906*%))07%&&)350%(%$'/($&%9+:A<A?D;5<@<('%+126).),*./2'67'$*)-)).'66365.'44,0+%12.)33<;.2>/-6,-+,,%.,'%$'9<4.+,%57;3+&&8<:2-42/A>-)%*422881@A;1=?3=9;<;77+*32-,.',,(''&;=;042.#&,,2)&82>662:8=-(88670'-0(%(&'&&==>53588'2(%4*%8=6%)&&+,*)((/9:9.;>=:)4<:3()%$%;A>DC@9@4/@40%'-,'$$'2*'65607C?>>>;>*6-'..%54979.--;<-445>$/-<:2$#$"(.1B5(%$&%777,+921207**.$&%&&195513'%&%+24<4?5%++01;(+;993**<8833''$4248;),.0$&#10$$.0,0:*00$%-(0%0'=8C>>=A@%=<<C66)&'.3=;36@D>6)688=9=8864/+2)*'#,4.+-/&%9:05=;:<91877=E@>B:.4655,84;3&2&.=;6,,)
> 

log240314_2.txt

@lucacozzuto
Copy link
Member

Hi, I cannot read the command line from your log. Can you send it to me please? The minimap and samtools used are in our docker image? I did not checked the M2 processor but hopefully soon I'll have one

@ddioken
Copy link
Author

ddioken commented Mar 18, 2024

(base) didemdkn@Didems-MBP mop_preprocess % nextflow run mop_preprocess.nf -with-docker -bg -profile m1mac --fast5 " " --fastq "/Users/didemdkn/Downloads/strvs45fastq3/**/*.fastq" --reference "/Users/didemdkn/Downloads/Homo_sapiens.GRCh37.cdna.fa" --annotation "/Users/didemdkn/Downloads/Homo_sapiens.GRCh37.gtf" --output "/Users/didemdkn/Downloads/strvs45fastqpreprocess" --ref_type transcriptome --mapping minimap2 --counting nanocount --saveSpace YES > log240314_2.txt

this is my command line. i checked it many times and it's looking ok to me but i couldn't find the problem :/

@lucacozzuto
Copy link
Member

Hi. Just checking your minimap command line... I think you don't need the parameters for splicing since is cDNA, no? Try to choose the right parameters (they are stored in the *tools_opt.tsv file indicated in the params.config file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants