Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add state_by API #1467

Closed
rohitkulshreshtha opened this issue Sep 24, 2024 · 0 comments · Fixed by #1469
Closed

Add state_by API #1467

rohitkulshreshtha opened this issue Sep 24, 2024 · 0 comments · Fixed by #1469
Assignees

Comments

@rohitkulshreshtha
Copy link
Contributor

rohitkulshreshtha commented Sep 24, 2024

Motivation/User Story

In the gossiping Anna example, the incoming gossip request contains two parts (messageID, writes) . The writes must be mapped out of the request and merged into the key-value store. If this is a new write for the node, an Ack(messageId) must be returned to the sender. If the node has seen the write earlier, a Nack(messageId) must be returned to the sender.

Proposal

Pass-through Contextual Information

The state API should support some form of pass-through information different from the inner type of state. This makes it easier/more efficient to process the output from the state API.

Options

  1. A: Explicit pass-through data— Instead of Input: T, we accept (P, T), where P is the pass-through type, and it is passed through without modification by state. The output of state is also (P, T).

Example: In the motivating user story, the input/output for state will be (messageId, writes).

  1. Pros - Check cons for (B)

  2. Cons - Kinda big change in the function signature

  3. B: Optional Mapping Function - state accepts Input: U and mapping function Fn(U) -> T where T is the inner type of the state . The input is passed through to output untouched, i.e. Output: U .

Semantics
`Stream<U>` → `state_by(F)` → `Stream<U>`

`U: Any`

`F: Fn(U) -> T`

`T: Lattice`

—

`state_i ∈ T`

`t_i = F(u_i)`

if `t_i` ⊈ `state_i` then emit `u_i`

`state_{i+1}` = `state_i ∪ t_i`

Example: In the motivating user story, the input/output from state can be (messageId, writes) and the mapping function is |(_message_id, writes)| writes .

  1. Pros — Smaller impact on the function signature. If the mapping function is not provided, the existing code works as-is.
  2. Cons - Does this solution preclude the hydroflow compiler from having insights into the mapping function? Could the mapping function’s computation be broken down further and expressed as hydroflow logic, allowing the compiler to make optimization decisions? In solution (A), P can result from a hydroflow pipeline.

Completeness of Results (in terms of action taken by the operator)**

Currently, the output pass-through stream only contains the items for which merge() returns true. This complicates sending Nack - the requests must be difference'ed or antijoin'ed with the state output. Instead, if the state returned output Result<T> with Ok, meaning the state changed, and Err, if the state didn’t change, then Acks and Nacks can be dispatched more easily.

Conclusion

  • Pass-through contextual information: Option B is to be implemented as state_by(func: Fn(U) -> T)
  • Completeness of Result: Not implemented for now.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant