Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NU-1921] Add standard deviation and variance aggregations #7307

Open
wants to merge 17 commits into
base: staging
Choose a base branch
from

Conversation

paw787878
Copy link
Contributor

@paw787878 paw787878 commented Dec 10, 2024

Describe your changes

Checklist before merge

  • Related issue ID is placed at the beginning of PR title in [brackets] (can be GH issue or Nu Jira issue)
  • Code is cleaned from temporary changes and commented out lines
  • Parts of the code that are not easy to understand are documented in the code
  • Changes are covered by automated tests
  • Showcase in dev-application.conf added to demonstrate the feature
  • Documentation added or updated
  • Added entry in Changelog.md describing the change from the perspective of a public distribution user
  • Added MigrationGuide.md entry in the appropriate subcategory if introducing a breaking change
  • Verify that PR will be squashed during merge

@github-actions github-actions bot added the docs label Dec 10, 2024
@paw787878 paw787878 changed the title Add standard deviation and variance aggregations [NU-1921] Add standard deviation and variance aggregations Dec 10, 2024
@paw787878 paw787878 marked this pull request as ready for review December 10, 2024 10:55
Copy link
Contributor

github-actions bot commented Dec 11, 2024

created: #7322
⚠️ Be careful! Snapshot changes are not necessarily the cause of the error. Check the logs.

@paw787878 paw787878 force-pushed the add-standard-deviation-and-variance-aggregations branch from 0e84fe5 to acf33c6 Compare December 12, 2024 09:27
@jedrz
Copy link
Contributor

jedrz commented Dec 12, 2024

Please add changelog entry.

docs/Changelog.md Outdated Show resolved Hide resolved
@TypeInfo(classOf[LargeFloatSumState.TypeInfoFactory])
// it would be natural to use type Number instead of this class
// it is done this way so that it is serialized properly
case class LargeFloatSumState(
Copy link
Contributor

@jedrz jedrz Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me it would be clearer to represent this class as 3 cases to avoid null handling:

  • empty sum
  • double sum
  • big decimal sum

The problem is, I don't know if Flink can serialize such class hierarchy efficiently.

@paw787878
Copy link
Contributor Author

Please add changelog entry.
done


private def isForStandardDeviationInsteadOfBeingForVariance(): Boolean = {
standardDeviationVarianceType match {
case SampleStandardDeviation => true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove ifs above and do sth like:

case PopulationVariance => populationVariance
case SampleVariance => sampleVariance(populationVariance)
case SampleStandardDeviation => MathUtils.largeFloatSqrt(populationVariance)
case PopulationStandardDeviation =>  MathUtils.largeFloatSqrt(sampleVariance(populationVariance))

?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants