Skip to content

fasterq-dump fails due to output file naming error #865

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dmalzl opened this issue Oct 16, 2023 · 9 comments
Closed

fasterq-dump fails due to output file naming error #865

dmalzl opened this issue Oct 16, 2023 · 9 comments

Comments

@dmalzl
Copy link

dmalzl commented Oct 16, 2023

I am currently trying to download a couple of raw sequencing data files using sra-tools prefetch and fasterq-dump. Prefetch works fine but I get a weird error when trying to convert the generated *.sra file to fastq with fasterq-dump. The data is paired-end and the actual path should be /scratch/daniel.malzl/work/aa/7ab6e5d29db7a0352a1f1cd4af2af3/SRX10737613_SRR14385311 but judging by the error message there seems to be some bug in the renaming code because it says the following:

        Error: fasterq-dump cannot create this file: '/scratch/daniel_1.malzl/work/aa/7ab6e5d29db7a0352a1f1cd4af2af3/SRX10737613_SRR14385311'

        Error: fasterq-dump cannot create this file: '/scratch/daniel_2.malzl/work/aa/7ab6e5d29db7a0352a1f1cd4af2af3/SRX10737613_SRR14385311'
spots read      : 174,563,529
reads read      : 349,127,058

=============================================================
An error occurred during processing.
A report was generated into the file '/users/daniel.malzl/ncbi_error_report.txt'.
If the problem persists, you may consider sending the file
to '[email protected]' for assistance.
=============================================================

fasterq-dump quit with error code 3

so it seems to insert the read1, read2 suffixes into the path causing the path to be invalid.

The version I am using is 3.0.8.

@dmalzl
Copy link
Author

dmalzl commented Oct 16, 2023

the executed command was this

fasterq-dump \
    --split-files --include-technical \
    --threads 6 \
    --outfile SRX10737613_SRR14385311 \
     \
    SRR14385311

@wraetz
Copy link
Contributor

wraetz commented Oct 16, 2023

It looks like the tool is confused about the output-file.
Try this command: 'fasterq-dump --split-files --include-technical SRR14385311'
The --threads 6 is not necessary, it is the default.
The --outfile is not neccessary, the tool will create the output-filename from the accession. I think it is confused because you included the experiment in the output-file. It should not be confused about that. I will have to investigate why this happens. In the mean time try the shortened command.

@dmalzl
Copy link
Author

dmalzl commented Oct 16, 2023

Thanks for the swift response and the workaround. I'll try to modify the code of the pipeline I am using. However, to me it looks like the path gets split at the . character somewhen in the process where the _1, _2 suffix is inserted and then concatenated again. So it might be the . confusing it but I try and report back

@wraetz
Copy link
Contributor

wraetz commented Oct 16, 2023

by the way... what is the version of fasteq-dump you are using?

@dmalzl
Copy link
Author

dmalzl commented Oct 16, 2023

the version is 3.0.8

@dmalzl
Copy link
Author

dmalzl commented Oct 17, 2023

Just to let you know. This does not occur in version 2.11.0

@drpatelh
Copy link

drpatelh commented Jan 5, 2024

Thanks for reporting @dmalzl ! And thanks for investigating @wraetz 🙏🏽

I have managed to reproduce the issue and the problem is indeed the fact that a . exists in the path where the output files will be written.

  1. Defined a Conda environment called env.yml with the dependencies below (you can exclude pigz if you like):
name: sra-tools-3.0.8
channels:
  - conda-forge
  - bioconda
  - defaults
dependencies:
  - conda-forge::pigz=2.6
  - bioconda::sra-tools=3.0.8
  1. Created the environment
conda env create -f env.yml
  1. ✅ Run with a path without a .
mkdir testwithoutdot
cd testwithoutdot

prefetch SRR12848126

fasterq-dump \
        --split-files --include-technical \
        --outfile SRX9315476_SRR12848126 \
        SRR12848126

2024-01-05T11:41:39 prefetch.3.0.8: Current preference is set to retrieve SRA Normalized Format files with full base quality scores.
2024-01-05T11:41:39 prefetch.3.0.8: 1) Downloading 'SRR12848126'...
2024-01-05T11:41:39 prefetch.3.0.8: SRA Normalized Format file is being retrieved, if this is different from your preference, it may be due to current file availability.
2024-01-05T11:41:39 prefetch.3.0.8:  Downloading via HTTPS...
2024-01-05T11:41:40 prefetch.3.0.8:  HTTPS download succeed
2024-01-05T11:41:40 prefetch.3.0.8:  'SRR12848126' is valid
2024-01-05T11:41:40 prefetch.3.0.8: 1) 'SRR12848126' was downloaded successfully
2024-01-05T11:41:41 prefetch.3.0.8: 'SRR12848126' has 1 unresolved dependency
2024-01-05T11:41:41 prefetch.3.0.8: 2) Downloading 'ncbi-acc:NC_000069.6?vdb-ctx=refseq'...
2024-01-05T11:41:41 prefetch.3.0.8:  Downloading via HTTPS...
2024-01-05T11:41:43 prefetch.3.0.8:  HTTPS download succeed
2024-01-05T11:41:43 prefetch.3.0.8: 2) 'ncbi-acc:NC_000069.6?vdb-ctx=refseq' was downloaded successfully
spots read      : 1,517
reads read      : 3,034
reads written   : 2,982
  1. ❌ Run with a path that contains a .
mkdir test.withdot
cd test.withdot

prefetch SRR12848126

fasterq-dump \
        --split-files --include-technical \
        --outfile SRX9315476_SRR12848126 \
        SRR12848126

2024-01-05T11:37:35 prefetch.3.0.8: Current preference is set to retrieve SRA Normalized Format files with full base quality scores.
2024-01-05T11:37:35 prefetch.3.0.8: 1) Downloading 'SRR12848126'...
2024-01-05T11:37:35 prefetch.3.0.8: SRA Normalized Format file is being retrieved, if this is different from your preference, it may be due to current file availability.
2024-01-05T11:37:35 prefetch.3.0.8:  Downloading via HTTPS...
2024-01-05T11:37:36 prefetch.3.0.8:  HTTPS download succeed
2024-01-05T11:37:36 prefetch.3.0.8:  'SRR12848126' is valid
2024-01-05T11:37:36 prefetch.3.0.8: 1) 'SRR12848126' was downloaded successfully
2024-01-05T11:37:37 prefetch.3.0.8: 'SRR12848126' has 1 unresolved dependency
2024-01-05T11:37:37 prefetch.3.0.8: 2) Downloading 'ncbi-acc:NC_000069.6?vdb-ctx=refseq'...
2024-01-05T11:37:37 prefetch.3.0.8:  Downloading via HTTPS...
2024-01-05T11:37:55 prefetch.3.0.8:  HTTPS download succeed
2024-01-05T11:37:55 prefetch.3.0.8: 2) 'ncbi-acc:NC_000069.6?vdb-ctx=refseq' was downloaded successfully

        Error: fasterq-dump cannot create this file: '/home/harshil/test_2.withdot/SRX9315476_SRR12848126'

        Error: fasterq-dump cannot create this file: '/home/harshil/test_1.withdot/SRX9315476_SRR12848126'
spots read      : 1,517
reads read      : 3,034

=============================================================
An error occurred during processing.
A report was generated into the file '/home/harshil/ncbi_error_report.txt'.
If the problem persists, you may consider sending the file
to '[email protected]' for assistance.
=============================================================

fasterq-dump quit with error code 3

@adamrtalbot
Copy link

The problem is this function here, which splits on any period found and creates a new filename. It should split on the final period only, or even better use some form of path handling (not 100% familiar with code).

rc_t split_filename_insert_idx( SBuffer_t * dst, size_t dst_size,
const char * filename, uint32_t idx ) {
rc_t rc;
if ( idx > 0 ) {
/* we have to split md -> cmn -> output_filename into name and extension
then append '_%u' to the name, then re-append the extension */
String S_in, S_name, S_ext;
StringInitCString( &S_in, filename );
rc = hlp_split_string_r( &S_in, &S_name, &S_ext, '.' ); /* helper.c */
if ( 0 == rc ) {
/* we found a dot to split the filename! */
rc = make_and_print_to_SBuffer( dst, dst_size, "%S_%u.%S",
&S_name, idx, &S_ext ); /* helper.c */
} else {
/* we did not find a dot to split the filename! */
rc = make_and_print_to_SBuffer( dst, dst_size, "%s_%u.fastq",
filename, idx ); /* helper.c */
}
} else {
rc = make_and_print_to_SBuffer( dst, dst_size, "%s", filename ); /* helper.c */
}
if ( 0 != rc ) {
release_SBuffer( dst );
}
return rc;
}

@klymenko
Copy link
Contributor

Please try release 3.2.1.

@durbrow durbrow closed this as completed Apr 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants