-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Is your feature request related to a problem or challenge?
Histograms in SQL can be computed fairly easily using case and order by.
An example of this kind of query is:
select
case
when salary between 75000 and 90000 then '75000-90000'
when salary between 90000 and 120000 then '90000-120000'
else '120000+'
end as salary_band,
count(*)
from employee_salary
group by 1
The default evaluation logic for case (which is equivalent to a chain of if/else statements) does not leverage the fact that the ranges being tested here are static and disjunct. In the example above it would be more optimal to compute the boundaries as an array [75000, 90000, 120000] and then try to find the appropriate bucket index by scanning that array.
It would be useful to have a dedicated case evaluation code path for this type of 'binning function' case expression that handles them more optimally. This is similar in idea to the work that's being done in #18183 for equality checks.
Describe the solution you'd like
In the case physical expression implementation, analyse the when conditions. If they are all range tests for the same input value using literals for the range boundaries and the order of the when expressions permits it, extract the range boundaries to an array and test the input value against the array to determine the appropriate then expression index.
If the then expressions are all literals, then there's also no need to actually evaluate those. Instead a lookup table can be used to map each row to the appropriate output value.
Describe alternatives you've considered
No response
Additional context
No response