-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fasterq-dump overloads memory #903
Comments
I also tried running on a local Docker ( |
For completeness, I did eventually get it to complete with the 4 core and 8GB/core configuration. I expect this will be dependent on the size of the data. |
@OOAAHH I was able to run your example without any issue. The SRA file is 14GB, and unpacked it leads to a 26GB FASTQ file. Are you sure you are not running out of disk quota? Some things I see: Your example does not provide a scratch space to store the temporary files, so they will be written to a temporary folder in the current directory. Also, unless It should further be noted that this particular data was uploaded as an aligned BAM. Dumping out a FASTQ file from a BAM-derived SRA file is mostly useless for scRNA-seq because any cell barcodes and UMIs will only be in the tags and not get properly dumped out. I don't know what you plan with the data, but for processing as scRNA-seq you are likely better off downloading the BAM (and .bai) directly from the ENA (see ERR4027871). |
First of all, thank you for your prompt and detailed response. Your insights have been incredibly helpful and have shed light on several oversight areas in my approach.
|
Glad to help. Fortunately, the .bai files shouldn't be essential - one can reindex with |
@permia please show the command you use an indicate at least one accession (SRR). |
The command I used is correct. fasterq-dump does encounter issues when processing certain random SRA files. Providing examples is not meaningful. |
@permia having multiple examples of failures can be valuable to developers. This thread is about possible memory issues in recent versions of Note that @OOAAHH did not in the end have the same issue, but rather appeared to be about disk space and managing temporary scratch spaces. Ultimately it was resolved it in an orthogonal way. |
I have installed
sra-tools
v3.0.10 distributed from Bioconda for linux-64 platform. Runningfasterq-dump
occupies far more RAM than the flags would imply (default 100 MB/core) or I have ever encountered before using identical commands. In previous versions, I always used 8 cores + 1GB/core, with-t
pointing to local scratch disk and VDB configured with plenty of room for thencbi/sra
cache. E.g.,Using the above for any SRRs from PRJNA544617 ends with LSF killing my jobs for exceeding memory. I have retried with:
all eventually killed for overallocating memory. I am currently running again with 4 cores + 8 GB/core (32 GB total).
This makes me suspect there is something off in this version with possibly:
/tmp/
instead of the designated-t
path, i.e.,--mem
argument (or not reading the default).Please let me know if I can provide any additional information.
The text was updated successfully, but these errors were encountered: