Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(internal): add sync tee util #7560

Merged
merged 4 commits into from
Apr 30, 2023
Merged

feat(internal): add sync tee util #7560

merged 4 commits into from
Apr 30, 2023

Conversation

mhofman
Copy link
Member

@mhofman mhofman commented Apr 30, 2023

refs: #6943

Description

This add a util to perform synchronous tee-ing of streams to the internal package. This really should be an @endo/stream tool, but I really need this now.

Security Considerations

None

Scaling Considerations

Avoid unbounded memory usage by synchronously consuming.

Documentation Considerations

Should move to endo

Testing Considerations

Added unit tests

@mhofman mhofman requested review from warner and kriskowal April 30, 2023 00:46
Copy link
Member

@warner warner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rely upon @kriskowal for a proper review, but the fact that this passes the unit tests below gives me confidence that it's working as intended.f

]);

t.deepEqual(output1, sourceData.slice(0, 2));
t.deepEqual(output2, sourceData.slice(0, 2));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also test that the original stream still has one item left? Or is that not a property we care about?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified the last test to be explicit about this.

* @param {AsyncIterator<T, void, void>} stream
* @param {number} quantity
*/
export const syncTee = (stream, quantity) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this is very much about async iterators, maybe name it synchronizedTee or lockstepTee?

Also, maybe numReaders or readerCount instead of quantity, to make the docstring a bit more instructive.

const rejections = [];
/** @type {Array<(value: PromiseLike<IteratorResult<T>>) => void>} */
const resolvers = [];
let done = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "done" we mean "one of the readers stopped reading early", right? Maybe use readerDone or readerStopped or something similar? Just trying to capture the fruits of my ignorance while I still have it :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done is really the term used in IteratorResults which have the shape of {done: boolean, value: any}.

const error = assert.error(assert.details`Teed stream threw`);
assert.note(error, assert.details`Teed rejections: ${rejections}`);
result =
stream.throw?.(error) ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know the semantics of stream.throw, but could it throw an error? And would that cause the chain of promises to break without telling the remaining readers? And, could it return a non-promise? In which case the result.then below would fail.

I see that any rejected promise by stream.return will still proceed to the notify-readers at the end of this function, so I think we're defensive against that method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know the semantics of stream.throw, but could it throw an error?

And, could it return a non-promise?

.throw usually rejects yes, which is why we capture the rejection. Unless it unleashes Zalgo, it should not throw synchronously. This is definitely expecting a well behaved source.

without telling the remaining readers

We do feed the result (likely a rejection) to all the readers. There is a test for that.

stream.return ... I think we're defensive against that method.

We're not any more defensive than for throw, but we make sure that if return does exist and returns a promise, that we end up rejecting instead like a generator source would have. These edge cases are not covered by tests however.

* @template T
* @param {AsyncIterable<T>} stream
* @param {Array<T>} output
* @param {number} [breakAfter]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is like maxItems, right? It's counting units of whatever the iterator is iterating over.

@warner
Copy link
Member

warner commented Apr 30, 2023

that test failure looked spurious (some web thing), so I kicked it

@mhofman
Copy link
Member Author

mhofman commented Apr 30, 2023

I'd rely upon @kriskowal for a proper review

This is inspired by his work, and I shared pretty much this implementation with him earlier. I'd love a proper review, but this is blocking an upcoming PR.

@mhofman mhofman added the automerge:squash Automatically squash merge label Apr 30, 2023
@mergify mergify bot merged commit 0f800a6 into master Apr 30, 2023
@mergify mergify bot deleted the mhofman/sync-tee branch April 30, 2023 04:04
* @param {AsyncIterator<T, void, void>} sourceStream
* @param {number} readerCount
*/
export const synchronizedTee = (sourceStream, readerCount) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

synchronized is an unnecessary qualifier. All streams are synchronized by default and can be made resilient to bursts by priming a buffer to each individual consumer. In the context of the @endo/stream library, makePipe returns a synchronized reader/writer which can be buffered by pre-acking a certain number of iterations, effectively allowing the producer to get exactly that far ahead of the consumer. Imagine:

AsyncIterator.prototype.tee(quantity: length): Array<AsyncIterator>;
AsyncIterator.prototype.buffer(length: number): AsyncIterator;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there can be non synchronized teeing. In fact I have a similar tee helper in another branch where the source is consumed at the pace of the fastest branch (allowing for unbounded memory growth).

Buffering is not equivalent as by definition that only allows for a certain bounded amount of deviation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d name these to encourage the use of bounded memory and probably wouldn’t even suggest standardizing an unbounded variant with the exception of terminal collectors like Array.fromAsync or response.json().

);
} else if (done) {
result =
sourceStream.return?.() ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest ?? over || here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once you have an array of promises that you know have settled, some of which may be rejected, it might be sufficient to lean on Promise.all to consolidate them into a rejected promise with an AggregateError instead of rolling that by hand.

But, it would suffice here to leave a comment with the reason you did not do that, which appears to be to connect with SES causal tracing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracking taming Promise.all, worth referencing too endojs/endo#1568

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Promise.all short circuits on rejection, which mean aggregation is not possible. I have in fact a PromiseAllOrRejections helper that does not short circuit, but alas it's not standard (yet?)

}
},
() => {
doneResult = { done: true, value: undefined };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider hardening the result.

return pullNext();
};

const readers = Array.from({ length: readerCount }).map(() => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL Array.from({length})! I’ve been reluctantly new Array(length).fill(undefined).

return reader;
});

void pullNext();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be extra certain that no rejections fall through to this promise. Ideally we could fall through here to .catch(reportError) but that is not present in Node.js yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge:squash Automatically squash merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants