-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test 4,546 Salmonella genomes? #6
Comments
Hi Giulio, The current pipeline is not very optimized for intermediate disk usage unfortunately, but there are some easy fixes. It's expected to have high disk usage, since we dump the intermediate uncompressed color matrix to disk (which is not even gzipped). The other issue is that the current version in github only supports upto 128 colors (I realize this constraint is not documented anywhere). Currently I am working on fixing these two issues. I have an experimental implementation that supports larger number of colors. I will test if it works on this dataset and then update the repo with the fixes. Thanks, |
Hi @amatur,
Yes, I think this is a severe limitation because it would prevent the use for even small files.
Oh, that's why! Let me know. Best, |
Hi @amatur and @yoann-dufresne, |
Dear all,
I'm trying to build your compressed representation (for k=31) on a rather small pangenome, which can be downloaded from here https://zenodo.org/records/1323684 and contains 4,546 Salmonella genomes.
Can you please try to build your archive on the same data?
Specifically, the pipeline run for ~5h before aborting, saying "no space left on device" which is very strange because I have over 1.5T available. Also, I've noticed that the pipeline outputs some very large intermediate files, like 186 GB. Do you confirm?
Is there any parameters I need to set (I've set -k 31 and -j 8)?
Thanks!
Best,
-Giulio
The text was updated successfully, but these errors were encountered: