Skip to content

Conversation

@agossard
Copy link
Contributor

Fixes #25451
Fixes #21755
Fixes #9014

A first cut here. Interested in any feedback.

Note: for multiple quantiles, we will always just sort up front (for now). quick select is still used on the case where we are only looking for one quantile, we can get a contiguous slice, and the data is not already sorted. Could consider implementing a multi-quick sort if we think it is worth it.

I don't love having .quantile and quantiles in so many places in the internal code, but wasn't sure how hard it would be to avoid that.

Note: I have not refactored .describe() yet to actually use this, can do. I did make qcut utilize this pathway.

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Nov 27, 2025
@nikaltipar
Copy link
Contributor

Thanks for adding the change. This might need some extra use-cases.

For instance, a test case with an array input for a few valid quantiles followed by an invalid one. etc

.map(|v: Option<f64>| v.map(|f| f as i64))
.collect::<Int64Chunked>()
.into_series();
// Cast the int64 series to the time type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not use a cast, but rather do into_time on the Int64Chunked.

.map(|v: Option<f64>| v.map(|f| f as i64))
.collect::<Int64Chunked>()
.into_series();
// Cast the int64 series to the duration type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not use a cast, but rather do into_duration on the Int64Chunked.

.map(|v: Option<f64>| v.map(|f| f as i64))
.collect::<Int64Chunked>()
.into_series();
// Cast the int64 series to the datetime type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not use a cast, but rather do into_datetime on the Int64Chunked.

.map(|v: Option<f64>| v.map(|f| (f * (US_IN_DAY as f64)) as i64))
.collect::<Int64Chunked>()
.into_series();
// Cast the int64 series to the datetime type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not use a cast, but rather do into_date on the Int64Chunked.

for _q in quantiles {
out.push(None);
}
Ok(out)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vec![None; quantiles.len()]

}

fn quantiles_reduce(&self, quantiles: &[f64], method: QuantileMethod) -> PolarsResult<Scalar> {
let v = self.quantiles(quantiles, method)?; // Vec<Option<f64>>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't annotate types in comments like this.


fn quantiles_reduce(&self, quantiles: &[f64], method: QuantileMethod) -> PolarsResult<Scalar> {
let v = self.quantiles(quantiles, method)?; // Vec<Option<f64>>
// build a Float64 series from the optional results, preserving nulls
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not building a Float64 series here.

@codecov
Copy link

codecov bot commented Dec 2, 2025

Codecov Report

❌ Patch coverage is 67.68293% with 106 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.81%. Comparing base (45819db) to head (f752ee9).

Files with missing lines Patch % Lines
...polars-core/src/series/implementations/duration.rs 30.00% 14 Missing ⚠️
crates/polars-expr/src/expressions/aggregation.rs 65.85% 14 Missing ⚠️
...tes/polars-core/src/series/implementations/date.rs 38.09% 13 Missing ⚠️
...polars-core/src/series/implementations/datetime.rs 38.09% 13 Missing ⚠️
.../polars-core/src/series/implementations/decimal.rs 40.90% 13 Missing ⚠️
...tes/polars-core/src/series/implementations/time.rs 31.57% 13 Missing ⚠️
...polars-core/src/chunked_array/ops/aggregate/mod.rs 70.00% 12 Missing ⚠️
crates/polars-core/src/chunked_array/ops/mod.rs 0.00% 5 Missing ⚠️
...s-core/src/chunked_array/ops/aggregate/quantile.rs 96.66% 3 Missing ⚠️
crates/polars-core/src/series/series_trait.rs 81.25% 3 Missing ⚠️
... and 1 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #25516      +/-   ##
==========================================
+ Coverage   79.61%   79.81%   +0.20%     
==========================================
  Files        1729     1729              
  Lines      239727   239952     +225     
  Branches     3038     3038              
==========================================
+ Hits       190857   191521     +664     
+ Misses      48087    47648     -439     
  Partials      783      783              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

/// Get the quantile of the Series as a new Series of length 1.
/// Default implementation delegates to `quantiles_reduce` with a single element
/// and unwraps the resulting `List` scalar to a plain scalar where possible.
fn quantile_reduce(&self, quantile: f64, method: QuantileMethod) -> PolarsResult<Scalar> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something is fundamentally wrong with the datatypes. You changed everything to go through quantiles_reduce and made the dtype quantiles_reduce dynamic based on the length of the input.

quantiles_reduce should always return a List, even if the input has length 1. Please undo the changes that makes everything go through quantiles_reduce.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Just want to make sure I fully understand what the issue is here and what you would like done. Putting aside the specific internal function implementations for a second, the desired behavior (confirming) is that we have one user exposed function (quantile) that has two different overloaded types behaviors:

  1. f64 input quantile -> f64 output
  2. list of f64 inputs -> list of f64 outputs

Obviously from a math/process standpoint, (1) is a just a version of (2), but from a types standpoint they are different.

internally, we need to go through a lot of steps/functions along the way to go all the way in and all the way out. My original implementation had those two different types cases handled in side by side functions (essentially) all the way down to the bottom and then back out again, including implementing a (new)“quantiles_reduce” function sitting side by side next to “quantile_reduce.” (each of which needing an implementation for all the different numerical types). This seemed bad to me, so I collapsed the two reduce functions into a single one that could handle both cases.

Are you saying you don’t want that and we should go back to having two reduce functions, one for f64 -> f64 and one for list -> list? Or are you saying everything can go through the new quantiles_reduce, but make that function always do list -> list and handle the scalar/list type conversion in the caller?

Copy link
Member

@orlp orlp Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying you don’t want that and we should go back to having two reduce functions, one for f64 -> f64 and one for list -> list?

Yes. They are different operations.

Or are you saying everything can go through the new quantiles_reduce, but make that function always do list -> list and handle the scalar/list type conversion in the caller?

This also would've been fine, but my preference goes to the above. What isn't fine is that the output datatype depends on the length of the input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars

Projects

None yet

4 participants