Avoiding recalculation. #20

cerebis · 2014-03-25T03:38:50Z

It is likely I am approaching this in a dumb fashion, but is it possible to pin intermediate elements in a Nestly workflow higher in the tree and not calculate them at each tip?

I have intermediate results which are consumptive of both disk and cpu and at the tip there is duplication since these intermediates only depend on a subset of the parameter space. Do I simply partition this into separate nests?

metasoarous · 2014-03-25T04:02:11Z

Are you using scons with nestly? If so, then you certainly can (I do all the time). If not, I can't tell you, since I pretty much only use nestly in concert with scons.

If you're not already, it's probably worth trying out scons given you have costly computations, since scons keeps track of dependencies/changes and only run things that actually need to be run.

cerebis · 2014-03-25T04:15:15Z

Ok thanks for the response. No I've not tried the scons integration. I'm unfamiliar with scons but I have used make and dependencies in C projects in the long ago past.

Being a build system I expected that it would avoid repeated calculations between runs over the same nest, but can it also identify redundant calculations within the same run?

metasoarous · 2014-03-25T04:47:49Z

If you've used make, scons should be a walk in the park :-) Still, I do wish we had a somewhat better tutorial/example for how to use these things together for someone not familiar with scons.

Fortunately though, 98% of what you need to know to get started using scons for this kind of thing is the Command function. It takes three arguments:

target - if you want to write out results to a file that is tracked, pass the target filename(s) as a string (or list of strings); [] otherwise if you don't want to write results to disk
source - each source should be a string, or the return values of other calls to Command; you can pass multiple sources in using a list; if you don't have any dependencies, pass []
action - either a string (which gets interpolated into a shell command) or an action function

Scons won't exactly find redundant calculations, but you get the desired behaviour based on where you add your targets. E.g.

n.add('level1', ['a', 'b', 'c'])

@n.add_target()
def some_intermediary(outdir, c):
    # note: I will get run once for each of a, b, c
    return Command(action, source, target)

n.add('level2', range(3))

@n.add_target()
def some_terminal(outdir, c):
    # note: I will get run once for each of (a, 0), (a, 1),..., (c, 2)
    return Command(...)

Does that help?

Note that there are all sorts of other fun things possible here:

If, for example, you wanted to take each one of the some_terminal results and combine them at a higher level in the tree, you could use the add_aggregate and pop functionality (see docs)
Scons for free lets you pass -j <some-number>, which will automatically parallelize computations up to <some-number> of jobs at a time.
^ This is nice if you have a bunch of cores, but if you want to submit jobs to a cluster (think -j 100), using slurm then you can use bioscons SlurmEnvironment

cerebis · 2014-03-25T05:05:46Z

Very cool and thanks for the great assistance. I'll dig into this tomorrow morning. :-)

I used an ugly hack of calling qsub with "-W block=true" to imitate local processes. heh.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoiding recalculation. #20

Avoiding recalculation. #20

cerebis commented Mar 25, 2014

metasoarous commented Mar 25, 2014

cerebis commented Mar 25, 2014

metasoarous commented Mar 25, 2014

cerebis commented Mar 25, 2014

Avoiding recalculation. #20

Avoiding recalculation. #20

Comments

cerebis commented Mar 25, 2014

metasoarous commented Mar 25, 2014

cerebis commented Mar 25, 2014

metasoarous commented Mar 25, 2014

cerebis commented Mar 25, 2014