-
Notifications
You must be signed in to change notification settings - Fork 501
refactor(benchmark): use ValueTotalSize for both valueContains args #7458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cf15ee5 to
bed890f
Compare
kwxm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest using the ValueTotalSize wrapper here. I think that'll straighten out the curved edge of the benchmarking data and will make it easier to see what's going on. If you're trying to check whether something wih a nodes is contained in something with b nodes then you can return immediately if a>b, which is everything on one side of the straight line given by a=b. Also the complexity bounds depend on the total number of nodes.
It might also be worth trying a size measure that adds the number of nodes in the outer map to the total of the numbers of nodes in the inner map since that kind of measures the total number of nodes in the value (see my remark about pretending it's one big tree). I think that in many cases (depending on the benchmark inputs) the number of outer nodes will be a lot smaller than the total number of inner nodes, so maybe adding on the number of outer nodes won't make too much difference.
Change valueContains benchmark to use ValueTotalSize wrapper for both container and contained arguments, instead of ValueLogOuterSizeAddLogMaxInnerSize for the container. This addresses Kenneth's observation that using total size for both will straighten out the curved edge of the benchmark data and make the early-exit boundary at n₁ = n₂ visible as a straight line.
Regenerate benchmark data and cost model parameters for valueContains using the new ValueTotalSize wrapper for both arguments. The new model uses multiplied_sizes with updated coefficients.
Update expected budget values for valueContains and array builtins conformance tests to match the new cost model parameters.
280bc38 to
94bc1fd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
I think the model should probably be
linear_in_x_and_yinstead ofmultiplied_sizes; try fitting models of both types to some benchmark results and see what happens. -
We probably don't need so many benchmark inputs (especially for smaller inputs: I think there are a lot of results squashed together near the origin and it's hard to see what's going on there), and maybe they could be a bit more evenly spaced: see this comment.. Also, having fewer inputs would speed up the turnaround on costing experiments. But now that I've said that, you can have lots of values with very different structures that have the same size, so I'm not so sure.
|
|
Closing this PR in favor of #7476 |



Summary
Change
valueContainsbenchmark to useValueTotalSizewrapper for both container and contained arguments, instead ofValueLogOuterSizeAddLogMaxInnerSizefor the container.This addresses Kenneth's observation that using total size for both arguments will:
Changes
Benchmark Wrapper Update
(ValueLogOuterSizeAddLogMaxInnerSize, ValueTotalSize)to(ValueTotalSize, ValueTotalSize)Builtins.hsto matchCost Model Regeneration
benching-conway.csvmultiplied_sizeswith updated coefficientsConformance Tests
valueContainstestsTechnical Context
The
valueContainsbuiltin usesMap.isSubmapOfBywith a splitLookup-based divide-and-conquer algorithm. By measuring both arguments usingValueTotalSize(total entry count), the cost model better reflects the actual complexity bounds where early exit occurs when contained size exceeds container size.Test plan
--ghc-options=-Werror