-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fud2] update path finding to support multi state input and output #2134
Conversation
Awesome! I think this is headed in the right direction as a first step. We obviously are already discussing more sophisticated approaches here in #2113, but I guess I am still optimistic that your approach here can work as a stepping stone?
Sorry to be somewhat slow on this, but I'm having trouble crystallizing the exact cases where this is true. Would it be possible to jot down a couple of examples (even if they are contrived)? That might help me understand and forget less quickly this time. 🤪
Great idea to defer this to a future change. |
oops, probably should have given an example. One example where behavior disagrees is on a state with a single op from itself to itself. The request asks to go from the state to itself through the given op. The old algorithm would handle this perfectly fine. The new algorithm as implemented would say no path found because it cannot have two files of the same (initial) state. (Sidenote: the specific cased used, self loops with one input and output, can be hacked in as a special case to fix this, but I wouldn't call having a bunch of special cases like this flexible or "good") Another example is one state with two paths to another state. One path is 2 ops long and another path is 1 op long. Assume a request from the first state to the latter and no --through argument. The old algorithm would always pick the 1 op long path as it is shorter. The new algorithm however may pick the 2 op long path as it searches through ops in an arbitrary order. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Thanks for your patience here—this took me a bit longer to review than I expected. But there is a lot of good stuff in here and I had a good time reading it over!
To summarize my view on this PR, I think it contains 3 main things:
- The general scaffolding to support multi-input/-output ops everywhere (i.e., the driver data structure is now a hypergraph).
- The new plan-search algorithm using exhaustive enumeration.
- A new approach to filename generation that is suited to our now more-complex plans.
Here is my current status on each element after spending some time with this PR:
- I understand the scaffolding stuff completely, and I left detailed comments that are mostly low-level code/docs stuff. Looking really good in general!!
- I do not really understand the new search algorithm yet, even though it's actually not very much code. The combination of recursion and mutation was too much for me, at least in this review session and with this level of documentation. This is some pretty knotty stuff that I may need your help to fully grok.
- I mostly understand the name-generation approach, but not 100%. The issue is that this seemingly-simple mechanism has suddenly gotten kinda fundamentally complicated in a way that may need further care.
I definitely don't want to hold up item 1, which is pretty close to be good to go, while I continue to wrestle with item 2 (and to some degree item 3). So I have an idea about how to proceed strategically:
- First, what if we remove the new search algorithm (item 2) from this PR and return to the old DFS thing? We would just slip a couple of
assert!
s or whatever into the DFS-based code so we panic whenever we encounter any op with more than 1 input or output. But at least the functionality would remain identical and we could get the rest of this PR merged.- There is probably no need to similarly separate out the name-generation stuff into a different PR, but we could do it if we really want.
- Second, we do what we've discussed a bit on Zulip/in person and separate out the search algorithm into a separate module with a minimal interface so it is easily replaceable.
- Third, and finally, we reintroduce your enumerative search algorithm alongside the current DFS-based search. This will make it possible to do a thing you mentioned at one point: cross-validate the two implementations against each other, either based on a compile-time option or even a very funky CLI flag.
This strategy would not only let me more properly review the exhaustive search algorithm without holding up the "item 1" refactoring; it would also mean that we would just matter less in general if search algorithms are kinda hard to understand while they are works in progress. What do you think; do you think breaking things down in this way would be feasible?
Also, to call out one high-level suggestion from the comments: would it simplify things if we assumed there was only 1 stdin and 1 stdout file per Request
? This seems certain to be true from a UI perspective, and it could alleviate some knottiness at various points.
fud2/fud-core/src/exec/driver.rs
Outdated
/// Find a chain of Operations from the `start` state to the `end`, which may be a state or the | ||
/// final operation in the chain. | ||
fn find_path_segment( | ||
/// Return parents ops of a given state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment (and possibly name) could be a little clearer: find the set of ops that have this state as an output.
Thanks for the thorough review! The steps you describe make sense. Currently working on the "scaffolding" part of the PR. On your suggestion: I don't think having only multiples files as input from |
The new commits do the following:
Question: As mentioned in the final bullet, I've currently settled on replacing the |
Awesome; I will attempt to take a look as soon as I can!
That seems like a great tactic to me!! Yay! I see no problems with leaving a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wahoo! This looks fantastic! I only found a few super minor things to comment on. Feel free to hit the big green button whenever you want, after either incorporating or ignoring my comments. 😃
Next, I suppose we can do a separate PR with the enumerative search thingy?
fud2/fud-core/tests/tests.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this empty file should be deleted?
This PR begins progress towards addressing #1958. It updates the algorithm used by
fud2
for finding paths from input files to output files to take in operations which have many inputs and outputs.The behavior concurs with the current
fud2
in some but not all cases and does not maintainfud2
's always finding the shortest path of operations.Specifying operations themselves and emitting the Ninja file still assume single state inputs and outputs. This will be changed in further PRs.