Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iteration consuming the heap is maybe too surprising #660

Open
jedwards1211 opened this issue Nov 21, 2024 · 5 comments
Open

Iteration consuming the heap is maybe too surprising #660

jedwards1211 opened this issue Nov 21, 2024 · 5 comments

Comments

@jedwards1211
Copy link

It's good the changelog documents this behavior, but I had mistakenly assumed iterating Symbol.iterator would just read out the raw backing array in unsorted order.

My assumption had a significant consequence, I had a debug route for fetching the internal state of some algorithms out of my server, and I was including [...this.heap] in the response, and after awhile I realized that requesting this internal state was accidentally emptying the queues.

Making this library have an interface familiar to users of libraries in other languages is only convenient up to the point that it goes against the conventions JS users are used to. And I think iteration not consuming underlying data is one of those conventions in JS. I'm aware of async iteration that consumes underlying data (e.g. async iterating a ReadableStream) but not sync iteration. I don't think the Python heapq behavior should win over conventions.

@ignlg
Copy link
Owner

ignlg commented Nov 27, 2024

Thank you for your feedback, @jedwards1211.

You raise a valid point. I have never been entirely comfortable with it, as it can feel like a dark pattern.

This behavior occurs because it is not straightforward to consume a binary tree in the expected order without recalculating the new top after a pop operation. Recalculating the new top is the core of the behavior when popping the top value. For this reason, the iteration process functions similarly to using pop() in a for loop with an array.

At the moment, I don't have a solid solution. Some thoughts:

  • Improving the documentation alerting more about this behavior.
  • Forcefully cloning the heap before it is consumed is an even darker pattern, as the performance and memory impact are high.
  • Removing the iterator would be a breaking change for projects using it, so this is not an option.

Typically, I don't need to maintain the binary tree structure since values are consumed and, in the event of a NACK or error, are pushed back into the tree. I believe this is the right approach. However, when it’s essential to keep the structure intact, I recommend cloning, which is why the .clone() method exists.

Do you have any ideas for an efficient way to create a more intuitive behavior?

@jedwards1211
Copy link
Author

jedwards1211 commented Nov 28, 2024

I guess my situation is kind of unique because I just needed to send a json array of queue entries over the wire (in an RPC method for debugging the internal state of the operations). I naively assumed the default iterator would just iterate over the internal array as is, without fully sorting it.

If I were designing a heap library from scratch my preference would probably be no default iterator method, and instead .unsortedIterator() and .consumingIterator() methods to make the behavior crystal clear.

Another option, which would be nice from an API perspective but more tricky to implement, would be to have [Symbol.iterator](options?: { consume?: boolean }) that always iterates in order, whether consuming or not. An optional argument doesn't conflict with the JS iterator protocol afaik. Sorting the heap in place without popping adds a bit of complexity though, since it's slightly different from the usual heap pop algorithm (if you're open to this I'm happy to code it up).

I can understand the goal of imitating familiar APIs from other languages though... at the time I was miffed that it clashed with JS expectations.

@jedwards1211
Copy link
Author

As far as what might have made me more likely to notice the documented behavior when skimming, maybe a warnings section and a table of contents at the top of the readme, I could PR that

@ignlg
Copy link
Owner

ignlg commented Dec 28, 2024

Thank you for the great suggestion to add a warnings section and a table of contents to improve the visibility of important behaviors. I appreciate your detailed feedback.

Could you submit a pull request (PR) to implement these documentation improvements? Your experience would ensure the warnings are clear and easy for other developers to find.

I'm also interested in your suggestion for the optional consume parameter for iteration. I can review a PR for that as well if you wish to pursue it.

@jedwards1211
Copy link
Author

@ignlg sure! I'm about to go on a trip so I could probably submit the documentation PR tomorrow but I'll have to wait until after the 8th to work on iterating in order without consuming. Feel free to bug me if I forget :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants