You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
this is still a private repository but I was using it to prototype some ideas on how to do the data integration of raspir, growth_rates, krakenuniq and metaphlan into haybaler.
To do data integration we need to improve the speed of raspir, which is very very slow at present. I'll update and handle this. Else the pipeline will be very slow on big datasets, and it needs to be fast. This is why we currently generate haybaler first, and raspir, grow rates etc later.
The files from raspir and reporting (the step before haybaler) are very similarly named (see README.md in this repo). Maybe it would be easiest to integrate at this point.
Can you have a look at this too - it's what Burkhard Tuemmler wants you to do before April, so it would be nice to do some further integration. Either within haybaler, or out of it.
I was looking at trying nextflow for this, but have only been working on it today, so we don't have to manually manage all the files being in and output. It could - theoretically - be a lot simpler, but a steeper learning curve at the start.
Nextflow can start scripts like Haybaler too, so may just replace the "runbatch_x" scripts rather than the pandas stuff.
cheers
Colin
The text was updated successfully, but these errors were encountered:
Hi @LisaHollstein
this is still a private repository but I was using it to prototype some ideas on how to do the data integration of raspir, growth_rates, krakenuniq and metaphlan into haybaler.
To do data integration we need to improve the speed of raspir, which is very very slow at present. I'll update and handle this. Else the pipeline will be very slow on big datasets, and it needs to be fast. This is why we currently generate haybaler first, and raspir, grow rates etc later.
The files from raspir and reporting (the step before haybaler) are very similarly named (see README.md in this repo). Maybe it would be easiest to integrate at this point.
Can you have a look at this too - it's what Burkhard Tuemmler wants you to do before April, so it would be nice to do some further integration. Either within haybaler, or out of it.
I was looking at trying nextflow for this, but have only been working on it today, so we don't have to manually manage all the files being in and output. It could - theoretically - be a lot simpler, but a steeper learning curve at the start.
Nextflow can start scripts like Haybaler too, so may just replace the "runbatch_x" scripts rather than the pandas stuff.
cheers
Colin
The text was updated successfully, but these errors were encountered: