Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

integration #2

Open
colindaven opened this issue Jan 25, 2022 · 0 comments
Open

integration #2

colindaven opened this issue Jan 25, 2022 · 0 comments

Comments

@colindaven
Copy link
Contributor

Hi @LisaHollstein

this is still a private repository but I was using it to prototype some ideas on how to do the data integration of raspir, growth_rates, krakenuniq and metaphlan into haybaler.

To do data integration we need to improve the speed of raspir, which is very very slow at present. I'll update and handle this. Else the pipeline will be very slow on big datasets, and it needs to be fast. This is why we currently generate haybaler first, and raspir, grow rates etc later.

The files from raspir and reporting (the step before haybaler) are very similarly named (see README.md in this repo). Maybe it would be easiest to integrate at this point.

Can you have a look at this too - it's what Burkhard Tuemmler wants you to do before April, so it would be nice to do some further integration. Either within haybaler, or out of it.

I was looking at trying nextflow for this, but have only been working on it today, so we don't have to manually manage all the files being in and output. It could - theoretically - be a lot simpler, but a steeper learning curve at the start.

Nextflow can start scripts like Haybaler too, so may just replace the "runbatch_x" scripts rather than the pandas stuff.

cheers
Colin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant