Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoiding recalculation. #20

Open
cerebis opened this issue Mar 25, 2014 · 4 comments
Open

Avoiding recalculation. #20

cerebis opened this issue Mar 25, 2014 · 4 comments

Comments

@cerebis
Copy link

cerebis commented Mar 25, 2014

It is likely I am approaching this in a dumb fashion, but is it possible to pin intermediate elements in a Nestly workflow higher in the tree and not calculate them at each tip?

I have intermediate results which are consumptive of both disk and cpu and at the tip there is duplication since these intermediates only depend on a subset of the parameter space. Do I simply partition this into separate nests?

@metasoarous
Copy link
Member

Are you using scons with nestly? If so, then you certainly can (I do all the time). If not, I can't tell you, since I pretty much only use nestly in concert with scons.

If you're not already, it's probably worth trying out scons given you have costly computations, since scons keeps track of dependencies/changes and only run things that actually need to be run.

@cerebis
Copy link
Author

cerebis commented Mar 25, 2014

Ok thanks for the response. No I've not tried the scons integration. I'm unfamiliar with scons but I have used make and dependencies in C projects in the long ago past.

Being a build system I expected that it would avoid repeated calculations between runs over the same nest, but can it also identify redundant calculations within the same run?

@metasoarous
Copy link
Member

If you've used make, scons should be a walk in the park :-) Still, I do wish we had a somewhat better tutorial/example for how to use these things together for someone not familiar with scons.

Fortunately though, 98% of what you need to know to get started using scons for this kind of thing is the Command function. It takes three arguments:

  • target - if you want to write out results to a file that is tracked, pass the target filename(s) as a string (or list of strings); [] otherwise if you don't want to write results to disk
  • source - each source should be a string, or the return values of other calls to Command; you can pass multiple sources in using a list; if you don't have any dependencies, pass []
  • action - either a string (which gets interpolated into a shell command) or an action function

Scons won't exactly find redundant calculations, but you get the desired behaviour based on where you add your targets. E.g.

n.add('level1', ['a', 'b', 'c'])

@n.add_target()
def some_intermediary(outdir, c):
    # note: I will get run once for each of a, b, c
    return Command(action, source, target)

n.add('level2', range(3))

@n.add_target()
def some_terminal(outdir, c):
    # note: I will get run once for each of (a, 0), (a, 1),..., (c, 2)
    return Command(...)

Does that help?

Note that there are all sorts of other fun things possible here:

  • If, for example, you wanted to take each one of the some_terminal results and combine them at a higher level in the tree, you could use the add_aggregate and pop functionality (see docs)
  • Scons for free lets you pass -j <some-number>, which will automatically parallelize computations up to <some-number> of jobs at a time.
  • ^ This is nice if you have a bunch of cores, but if you want to submit jobs to a cluster (think -j 100), using slurm then you can use bioscons SlurmEnvironment

@cerebis
Copy link
Author

cerebis commented Mar 25, 2014

Very cool and thanks for the great assistance. I'll dig into this tomorrow morning. :-)

I used an ugly hack of calling qsub with "-W block=true" to imitate local processes. heh.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants