-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plans on full support of nested calculations? #41
Comments
Currently the semantics of Graph are that each function is called (at most) I guess it's possible to consider adding subgraphs that 'map' over a set of On Thu, Jun 12, 2014 at 3:57 AM, Gunnar Völkel [email protected]
|
I am also looking into (what I think is) the same thing. From the mailing list:
|
I have implemented this type of calculation using the current graph library, but am evaluating alternative ways of implementing it. I will go over:
Any feedback/suggestions would be greatly appreciated. This type of discussion seems better suited to the message board, but I like the UI of Github Issues better, so I am posting it here. If you would prefer though, I am happy to move the discussion over. SubgraphsSubgraphs are nested calculations that will be run multiple times. They get some of their values from the parent graphs and others each time they are evaluated. My Use CaseI am creating a general purpose metaheuristic search framework. Primarily, it is intended for research into genetic programming. The whole point of the framework is to be able to be able to modularly assemble a search technique (that includes the algorithm, the problem, and many configuration parameters) in order to run it and look at the results. The results are a sequence of (def Traits
"Traits are any information we know about an individual or a generation. For example
this could be how well it does on a certain test case or the best individual in the current generation."
{ s/Any (s/maybe s/Num)})
(def Individual
"One individual in the population"
{:genome s/Any
:id s/Str
:traits Traits
:parent-ids #{s/Str}})
(def Generation
"Holds the whole state for a current generation of individuals."
{:index s/Int
:traits Traits
:individuals #{Individual}}) The whole process is run by a graph that has a To compute each GoalsWhile getting nested calculations to work is a necessity, I have some other goals that shape how I evaluate possible solutions
Solutions... continued below |
Solution 1: Functions at nodes (currently implemented)Our current approach is based on using functions as many of the graph values. So they are For example, we often want to stop searching after a certain fixed number of generations. So the Examples psuedocode for how thise works graph: :generations (fnk [done? step initial] (take-until done? (recur step initial))
:done? (fnk [max-generations] (fn [gen] (> max-geneartions (:index gen))
:initial (fnk [->genome population-size]) (repeatedly population-size ->genome))
:step (fnk [select tweak population-size] (fn [generation] (->make-new-generation generation (-> generations select tweak)) Goals
👍 We can test each node easily, we just call it once with the top level config parameters to produce the function then again. (is (not ((max-generations->done? {:max-generations 10}) (c/complete {:index :1} Generation))))
(is ((max-generations->done? {:max-generations 1}) (c/complete {:index :1} Generation)))
👌 We can wrap each node to check if it returns a function. If it does, we can wrap the function it returns in the required code. This is more complex than wrapping each node directly, but it does work.
👌 All the parallelism has to take place on a case by case basis inside the functions (example using pmap).
👍 To combine graphs, we can just use the (g/graph
:population-size (fnk [] 1000)
:initial (g/instance initial/->genome-> [population-size] {:n population-size})
:evaluate evaluate/genome->traits->
step/graph
:generations base/generations))
👎 Lazy graph computation doesn't help us here, because the computation actually happens when calling the function. What I mean is that, in each generation, all the individuals and traits for those individuals are calculated fully. But we might not need to calculate a certain trait for an individual, if it doesn't get selected for another reason.
👎 It isn't possible to have {:best-individual ...
:best-traits ...
:average-traits ...
} With this strategy, all of those would have to be calculated by one function that produced the total
👌 We should be able to produce graphs of what functions require one another, but we don't get much of a sense exactly how each inner computation works.
👌 It is just a normal graph, so compiling it is no problem and works fine with any method. However, things like lazy graphs aren't helpful in this case, because it isn't the evaluation of nodes that takes time, but the execution of the function they return. |
Thanks for the detailed write-up. A possible alternative within the current framework would be to use two graphs:
The two graphs could share input parameters, etc. |
Yeah I have been thinking about that approach a lot in the past couple of days. Nested fnks as leavesI was actually thinking of nesting the graphs in another like this: (g/graph
:gen->gen-graph
(g/graph
:->ind-graph
(g/graph
:parents (fnk [] (fnk [->parent] (fnk [[:tweak n-parents]] (repeatedly n-parents ->parent))))
:tweak (fnk [all-tweaks] (fnk [] (fnk [] (rand-nth all-tweaks))))
:->testcase-graph
(g/graph
:calc-y (fnk [evaluate] (fnk [] (fnk [genome] (fnk [x] (evaluate genome x)))))
:abs-difference (fnk [] (fnk [] (fnk [] (fnk [calc-y actual-y] (Math/abs (- calc-y actual-y))))))
:abs-difference-sq (fnk [] (fnk [] (fnk [] (fnk [abs-difference] (* abs-difference abs-difference))))))
:->testcase (fnk [] (fnk [] (fnk [->testcase-graph] (fn [x y] (g/run ->testcase-graph {:x x :actual-y y})))))
:individual
{:id (fnk [] (fnk [] (fnk [] (uuid))))
:parent-ids (fnk [] (fnk [] (fnk [parents] (map :id parents))))
:genome (fnk [] (fnk [] (fnk [tweak parents] (apply tweak parents))))
:traits {:program-size (fnk [] (fnk [] (fnk [genome] (count genome))))
:testcase-differences (fnk [testcases] (fnk [] (fnk [->testcase] (map (comp :abs-difference ->testcase) (seq testcases)))))}})
:->ind (fnk [] (fnk [->ind-graph] (fn [] (->> {}
(g/run ->ind-graph {})
:individual))))
:->parent (fnk [] (fnk [[:prev-gen individuals]] (partial rand-nth individuals)))
:generation
{:index (fnk [] (fnk [[:prev-gen index]] (inc index)))
:uuid (fnk [] (fnk [] (uuid)))
:individuals (fnk [population-size] (fnk [->ind] (repeatedly population-size ->ind)))})
:gen->gen (fnk [gen->gen-graph] (fn [gen] (->> {:prev-gen :gen}
(g/run gen->gen-graph)
:generation)))
:generations (fnk [gen->gen initial] (iterate gen->gen initial))
:testcases (fnk [test-fn testcase-xs] (zipmap testcase-xs (map (test-fn testcase-xs))))) That way any nested graph can pull whatever it needs from parent graphs. This doesn't require any changes to the plumbing libraries, since each time it compiles a graph it should unwrap one of the One advantage of this approach is that it makes clear where each value is unique to. For example the This however becomes pretty darn verbose. |
Return subgraphs from leavesWe could also do it by not nesting each fnk but instead nesting each subgraph itself in a fnk. (g/graph
:testcases (fnk [test-fn testcase-xs] (zipmap testcase-xs (map (test-fn testcase-xs))))
:gen->gen-graph
(fnk [population-size all-tweaks actual-value evaluate testcases evaluate]
(g/graph
:->ind-graph
(fnk [all-tweaks actual-value evaluate individuals testcases evaluate]
(g/graph
:parents (fnk [->parent [:tweak n-parents]] (repeatedly n-parents ->parent))
:tweak (fnk [all-tweaks] (fnk [] (fnk [] (rand-nth all-tweaks))))
:->parent (fnk [[:prev-gen individuals]] (partial rand-nth individuals))
:->testcase-graph
(fnk [evaluate [:individual genome]]
(g/graph
:calc-y (fnk [x] (evaluate genome x))
:abs-difference (fnk [calc-y actual-y] (Math/abs (- calc-y actual-y)))
:abs-difference-sq (fnk [abs-difference] (* abs-difference abs-difference))))
:->testcase (fnk [->testcase-graph] (fn [x y] (g/run ->testcase-graph {:x x :actual-y y})))
:individual
{:id (fnk [] (uuid))
:parent-ids (fnk [parents] (map :id parents))
:genome (fnk [tweak parents] (apply tweak parents))
:traits {:program-size (fnk [genome] (count genome))
:testcase-differences (fnk [->testcase] (map (comp :abs-difference ->testcase) (seq testcases)))}}))
:->ind (fnk []
(fn []
(->> {}
(g/run ->ind-graph)
:individual)))
:generation
{:index (fnk [[:prev-gen index]] (inc index))
:uuid (fnk [] (uuid))
:individuals (fnk [population-size ->ind] (repeatedly population-size ->ind))}))
:gen->gen (fnk [gen->gen-graph]
(fn [gen]
(->> {:prev-gen :gen}
(g/run gen->gen-graph)
:generation)))
:generations (fnk [gen->gen initial] (iterate gen->gen initial))) While this is a bit less verbose, it also requires specifying all the required arguments from a higher level as a inputs to each graph, which isn't DRY. What I mean is that if I had a top level key, like It is also not easily composable, because it isn't a large map anymore, but a bunch of nested fnks. So if I wanted to add a new computation inside the |
Partially evaluated fnksIt would be ideal to be able to have non nested fnks and still be able to keep the things as one large map, like this: (g/graph
:testcases (fnk [test-fn testcase-xs] (zipmap testcase-xs (map (test-fn testcase-xs))))
:gen->gen-graph
(g/graph
:->ind-graph
(g/graph
:parents (fnk [->parent [[:tweak n-parents]] (repeatedly n-parents ->parent)])
:tweak (fnk [all-tweaks] (rand-nth all-tweaks))
:->testcase-graph
(g/graph
:calc-y (fnk [evaluate genome x] (evaluate genome x))
:abs-difference (fnk [calc-y actual-y] (Math/abs (- calc-y actual-y)))
:abs-difference-sq (fnk [abs-difference] (* abs-difference abs-difference)))
:->testcase (fnk [->testcase-graph] (fn [x y] (g/run ->testcase-graph {:x x :actual-y y})))
:individual
{:id (fnk [] (uuid))
:parent-ids (fnk [parents] (map :id parents))
:genome (fnk [tweak parents] (apply tweak parents))
:traits {:program-size (fnk [genome] (count genome))
:testcase-differences (fnk [testcases ->testcase] (map (comp :abs-difference ->testcase) (seq testcases)))}})
:->ind (fnk [->ind-graph] (fn [] (->> {}
(g/run ->ind-graph)
:individual)))
:->parent (fnk [[:prev-gen individuals]] (partial rand-nth individuals))
:generation
{:index (fnk [[:prev-gen index]] (inc index))
:uuid (fnk [] (uuid))
:individuals (fnk [population-size ->ind] (repeatedly population-size ->ind))})
:gen->gen (fnk [gen->gen-graph] (fn [gen] (->> {:prev-gen :gen}
(g/run gen->gen-graph)
:generation)))
:generations (fnk [gen->gen initial] (iterate gen->gen initial))) In order to support running this, I think only two things would need to be changed:
With these two changes in place, the top level We lose some safety however, with these relaxations. We can't verify as much at graph compile time. |
Thanks for the details. I'm not sure if this is the full graph, but my gut feeling from looking at it is that you don't really need the outer layer to be a graph. And if you just make that part an ordinary function, then I think all the complexity goes away? Or if not, I still think you'd be better off having them be two separate graphs. First you compile the generation one, and pass it as a parameter to the driver. (Graph isn't meant to model all your logic and computations, just those where the added complexity pays off.) My feeling is that Graph has a lot of intrinsic "magic" in how data moves around, and so I've tried to eliminate any unnecessary magic and make things as safe as possible while preserving that core. That's one reason, for instance, fnks have to explicitly name the parameters then want and can't just say |
@w01fe Yeah I agree that is is too magic/complex to be in core. Maybe the problem is that I am depending on Graph for more than computation. It also serves as a way to compose parts of a solutions easily. That probably isn't what you built graph for. For me details check out how I turn a bunch of graph pieces into a search. I have been thinking about my last example a bunch in the last few days. Trying to figure out how to write Basically be able to compile #41 (comment) into #41 (comment) if we just give it one more piece of information. This would be somehow providing in each a subgraph the inputs that are given to it when it is run. For example for For example this:
would be precompiled into this:
Do you know what I mean? I might write a separate library for that. Maybe the current way of just using functions is simpler, but I am curious how this way plays out in full. The advantages I see of this approach is that it keeps all the current safety of graphs and will even let you know if you forget to define a key in subgraphs when you compile the top graph. I think I would do this by starting at the leaf sub graphs and checking for each leaf if there are any keys it requires that aren't outputted by that graph or by the We could then wrap all the inner nodes easily (for profiling etc) by using the un-precompiled graph. |
Cool. I kind-of follow, but haven't taken the time to fully understand the example and motivations behind the pieces. If you end up making something, I'd definitely be interested to take a look -- thanks! |
I have scenario where I have a root graph for the main calculation and a subgraph that needs to be applied to a collection of input maps. Currently, this is implemented via two different graphs together with a function for applying the subgraph to each input map.
The mentioned collection of input maps will not fit into memory in general.
But that way I need to manage dependencies on properties of the subgraph in the root graph myself. It would be nice if that could be done with prismatic.graph.
Are there any plans to wnable something like that? Or is this already possible with the current implementation using some implementation pattern?
The text was updated successfully, but these errors were encountered: