Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref: RangeMap class #83259

Merged
merged 13 commits into from
Jan 22, 2025
Merged

ref: RangeMap class #83259

merged 13 commits into from
Jan 22, 2025

Conversation

gggritso
Copy link
Member

@gggritso gggritso commented Jan 10, 2025

Extracting a RangeMap class. I'm omitting the actual refactor to use RangeMap out of this PR for tidiness.

The mechanism of matching values against numeric ranges is really common at Sentry, I want to use this in more places.

My immediate need is duration formatting. We have code all over the place that takes a count of milliseconds and then decides whether it should be converted to seconds, minutes, years, etc. by checking this exactly kind of range matching logic. GranularityLadder is similar, it checks the current duration and matches it to a string value by range.

@github-actions github-actions bot added the Scope: Frontend Automatically applied to PRs that change frontend components label Jan 10, 2025
@gggritso gggritso requested a review from a team January 10, 2025 20:50
Copy link

codecov bot commented Jan 10, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #83259      +/-   ##
==========================================
- Coverage   87.63%   87.56%   -0.07%     
==========================================
  Files        9490     9470      -20     
  Lines      541208   536079    -5129     
  Branches    21232    21150      -82     
==========================================
- Hits       474272   469417    -4855     
+ Misses      66588    66305     -283     
- Partials      348      357       +9     

Copy link

codecov bot commented Jan 10, 2025

Bundle Report

Bundle size has no change ✅

@gggritso gggritso marked this pull request as ready for review January 13, 2025 14:09
@gggritso gggritso requested a review from a team as a code owner January 13, 2025 14:09
@@ -0,0 +1,63 @@
import orderBy from 'lodash/orderBy';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will say that I am generally not a fan of such APIs and would rather not have them exist.

The main reasoning is that it is essentially an alternative for array.sort().find() or array.find(), which is simple enough that it should be inlined. If the goal here is to dry up the code, then I would argue that the cognitive task of having to learn and understand what a ladder class and rung does negates that intent as you are just substituting the time to write the code for the time it will take you to re-read and understand it again in the future. Til that that ladder steps are called rungs 😅

Besides that, a common question that folks might have with methods like .rung is understanding if the bounds are inclusive? Given the definition of [0,10], [10,20], would a query with the value of 10 return 0-10 or 10-20 interval? This isn't very clear with such an API and would require someone to read the actual implementation and understand it.

I would add that since this is not a common data structure, it means that people are also going to be prone to simply forgetting that it exists, which means we'll just end up with a lot of instances of array.sort().find() anyways.

One alternative that could be more intuitive here is to use an interval tree, which is more common and would also allow you to define and query sub intervals like [0,10], [3,6], while another might be to define this as a findInterval helper function (this would probably be my preference), which would be more flexible and ergonomic to use as opposed to having to construct a class.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 @JonasBa that's fair enough but, the helpful part of this class is the construction, not the array.find() aspect. A Ladder ensures by definition that:

  1. The intervals are non-overlapping. Only 1 interval will match an input value
  2. The intervals are contiguous. There are no gaps between the top and bottom value, and there is no maximum value. Lookups for any number above 0 are guaranteed a match

The idea of this class is actually more aligned with Rust's or Ruby's pattern matching than an interval tree!

  match x {
      1..=5 => println!("one through five"),
      _ => println!("something else"),
  }

This is pretty different from interval trees, IMO. I wonder if giving it a less abstract name like RangeMatch or IntervalMatcher would be helpful here?

As an aside, maybe it's a "me" problem, but remembering whether matches are inclusive is a problem for me with every matching API. If there was a findInterval function I'd still have to look up whether the segment matches are inclusive, and on which boundary!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intervals are non-overlapping. Only 1 interval will match an input value
The intervals are contiguous. There are no gaps between the top and bottom value, and there is no maximum value. Lookups for any number above 0 are guaranteed a match

What if someone were to have a use case for negative values or a max? At what point is this overconstrained?

The idea of this class is actually more aligned with Rust's or Ruby's pattern matching than an interval tree!

Yes, but there is a big difference in readability when you have an expression which is inlined vs hidden behind a method implementation. In the case of ruby (not sure about rust), the range expression defines the start, end and the inclusivity (... vs ..). This isn't something we can replicate without native pattern matching so it means it will always be hidden behind the API we expose.

As an aside, maybe it's a "me" problem, but remembering whether matches are inclusive is a problem for me with every matching API. If there was a findInterval function I'd still have to look up whether the segment matches are inclusive, and on which boundary!

It's not, both solutions have the same issue :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I gotta think about this some more! Moving this back to draft while I mull this over. Thanks for the feedback!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gggritso I think I will change my mind here about this not needing to exist (my interval tree interpretation was also wrong)

I think what you described is valuable, and you seem to have a need for this, and I dont think I want to block that. My main concern however with this is that if you target a generic implementation that you expect to be reused, which seems like you are, you will need to find a more familiar naming for it. This is obviously not a requirement, as eod it is just naming, but I think it will ultimately help you reach better adoption and prevent people from reinventing this in the future.

I will leave a couple comments around the implementation, but eod, I think you should be good to merge if that is acceptable to you :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still thinking about it! To your point, this isn't very discoverable or understandable. For me, it comes down to 1 question. Is repetition of this kind of code acceptable:

function getValueCategory(value: number): string {
  if (value >= 1000) {
    return 'super-high';
  } else if (value > 500) {
    return 'pretty-high';
  } else if (
...etc

There are places in the code where those if/else numerical comparisons are pretty big. For example, getDuration, or categorizeDuration but there are more.

The question to me, is there value in encapsulating that kind of behaviour? If we're okay with having those long if/else chains then this PR is definitely not needed. Long if/else chains are long and it's easy to make a mistake but it's also very clear what they're doing. I was pretty confident that I want to encapsulate that, but now I'm less sure.

The second thing is, can a good API be designed to encapsulate it? Other languages have a built-in way to do this, JS does not. I did an okay job here, but not a great job. The naming needs a lot of work (I think "ladder" and "rung" is a little abstract), and I think that the [minimum: number, value: number] is a bit confusing because it's not clear at a glance where the steps start/end. I think a better API can improve that!

I'm open to input on both counts. I think you're coming around to a need for this kind of class, but I'm open to just not having it. You also have suggestions on implementation/naming and I'm of course open to that!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JonasBa After thonking [sic] about this for a while, I made some big changes.

  • class name is RangeMap, I think this is a cleaner and pretty explicit way to explain what's going on. What do you think?
  • RangeMap specifies all ranges explicitly with a minimum and a maximum. There are no longer any restrictions on starting from 0, or any default match against infinity
  • the map can be sparse, with missing values. get might return undefined

This seems much simpler, and all the other suggestions in the PR fell in naturally with that approach.

Thoughts?

@gggritso gggritso marked this pull request as draft January 15, 2025 15:38
@gggritso gggritso changed the title ref: Ladder class ref: RangeMap class Jan 20, 2025
@gggritso gggritso marked this pull request as ready for review January 20, 2025 16:42
Comment on lines +6 to +8
{min: 0, max: 10, value: 'first'},
{min: 10, max: 20, value: 'second'},
{min: 20, max: 50, value: 'third'},
Copy link
Member

@JonasBa JonasBa Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about something like this to indicate that values are inclusive?

Suggested change
{min: 0, max: 10, value: 'first'},
{min: 10, max: 20, value: 'second'},
{min: 20, max: 50, value: 'third'},
{min: 0, max: 10, value: 'first'},
{min: 11, max: 20, value: 'second'},
{min: 21, max: 50, value: 'third'},

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little wary of that because it only works for integers, and the class doesn't do anything to enforce integer-only use. Should it? It seems like there's a use-case for matching ranges to floats though I don't see it in the code right now.

Another option is maybe to make the boundary matching explicit, i.e., allow configuring which boundary is inclusive, and have it be min by default

Copy link
Member

@JonasBa JonasBa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love it when we disagree, because we always end up with something better in the end. Nice work @gggritso!

@gggritso
Copy link
Member Author

@JonasBa thanks for your feedback as always! 🤝

@gggritso gggritso merged commit 5a30dd4 into master Jan 22, 2025
43 checks passed
@gggritso gggritso deleted the ref/ggg/ladder-class branch January 22, 2025 16:05
andrewshie-sentry pushed a commit that referenced this pull request Jan 22, 2025
Extracting a `RangeMap` class. I'm omitting the actual refactor to _use_
`RangeMap` out of this PR for tidiness.

The mechanism of matching values against numeric ranges is really common
at Sentry, I want to use this in more places.

My immediate need is duration formatting. We have code all over the
place that takes a count of milliseconds and then decides whether it
should be converted to seconds, minutes, years, etc. by checking this
exactly kind of range matching logic. `GranularityLadder` is similar, it
checks the current duration and matches it to a string value by range.
@github-actions github-actions bot locked and limited conversation to collaborators Feb 7, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Scope: Frontend Automatically applied to PRs that change frontend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants