Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[backport to 2.x] Add Support for Handling Missing Data in Anomaly Detection (#1274) #1281

Merged
merged 1 commit into from
Aug 19, 2024

Conversation

kaituo
Copy link
Collaborator

@kaituo kaituo commented Aug 19, 2024

Description

backport #1274 to 2.x

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…h-project#1274)

* Add Support for Handling Missing Data in Anomaly Detection

This PR introduces enhanced handling of missing data, giving customers the flexibility to choose how to address gaps in their data. Options include ignoring missing data (default behavior), filling with fixed values (customer-specified), zeros, or previous values. These options can improve recall in anomaly detection scenarios. For example, in this forum discussion https://forum.opensearch.org/t/do-missing-buckets-ruin-anomaly-detection/16535, customers can now opt to fill missing values with zeros to maintain detection accuracy.

Key Changes:
1. Enhanced Missing Data Handling:

Changed to ThresholdedRandomCutForest.process(double[] inputPoint, long timestamp, int[] missingValues) to support missing data in both real-time and historical analyses. The preview mode remains unchanged for efficiency, utilizing existing linear imputation techniques. (See classes: ADColdStart, ModelColdStart, ModelManager, ADBatchTaskRunner).

2. Refactoring Imputation & Processing:

Refactored the imputation process, failure handling, statistics collection, and result saving in Inferencer.

3. Improved Imputed Value Reconstruction:

Reconstructed imputed values using existing mean and standard deviation, ensuring they are accurately stored in AnomalyResult. Added a featureImputed boolean tag to flag imputed values. (See class: AnomalyResult).

4. Broadcast Support for HC Detectors:

Added a broadcast mechanism for HC detectors to identify entity models that haven’t received data in a given interval. This ensures models in memory process all relevant data before imputation begins. Single stream detectors handle this within existing transport messages. (See classes: ADHCImputeTransportAction, ADResultProcessor, ResultProcessor).

5. Introduction of ActionListenerExecutor:

Added ActionListenerExecutor to wrap response and failure handlers in an ActionListener, executing them asynchronously using the provided ExecutorService. This allows us to handle responses in the AD thread pool.

Testing:
Comprehensive testing was conducted, including both integration and unit tests. Of the 7135 lines added and 1683 lines removed, 4926 additions and 749 deletions are in tests, ensuring robust coverage.

Signed-off-by: Kaituo Li <[email protected]>

* rebase from main

Signed-off-by: Kaituo Li <[email protected]>

* add comment and remove redundant code

Signed-off-by: Kaituo Li <[email protected]>

---------

Signed-off-by: Kaituo Li <[email protected]>
Copy link

codecov bot commented Aug 19, 2024

Codecov Report

Attention: Patch coverage is 76.46177% with 157 lines in your changes missing coverage. Please review.

Project coverage is 77.60%. Comparing base (138dbd0) to head (3d93b50).
Report is 14 commits behind head on 2.x.

Files Patch % Lines
...in/java/org/opensearch/ad/model/AnomalyResult.java 47.22% 14 Missing and 5 partials ⚠️
...n/java/org/opensearch/timeseries/model/Config.java 22.72% 13 Missing and 4 partials ⚠️
...ensearch/timeseries/transport/ResultProcessor.java 82.66% 8 Missing and 5 partials ⚠️
...org/opensearch/ad/transport/ADHCImputeRequest.java 45.45% 12 Missing ⚠️
...g/opensearch/timeseries/ml/RealTimeInferencer.java 83.56% 8 Missing and 4 partials ⚠️
...pensearch/ad/transport/ADHCImputeNodeResponse.java 26.66% 11 Missing ⚠️
...search/ad/transport/ADHCImputeTransportAction.java 79.41% 4 Missing and 3 partials ⚠️
...opensearch/ad/transport/ADHCImputeNodeRequest.java 40.00% 6 Missing ⚠️
.../java/org/opensearch/ad/ml/ThresholdingResult.java 66.66% 2 Missing and 3 partials ⚠️
...ensearch/ad/transport/ADHCImputeNodesResponse.java 28.57% 5 Missing ⚠️
... and 20 more
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##                2.x    #1281      +/-   ##
============================================
+ Coverage     71.40%   77.60%   +6.20%     
- Complexity     4878     5436     +558     
============================================
  Files           518      532      +14     
  Lines         22931    23252     +321     
  Branches       2260     2301      +41     
============================================
+ Hits          16373    18044    +1671     
+ Misses         5509     4167    -1342     
+ Partials       1049     1041       -8     
Flag Coverage Δ
plugin 77.60% <76.46%> (+6.20%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
.../java/org/opensearch/ad/AnomalyDetectorRunner.java 80.95% <100.00%> (+37.92%) ⬆️
...ava/org/opensearch/ad/ml/ADRealTimeInferencer.java 100.00% <100.00%> (ø)
.../org/opensearch/ad/model/ImputedFeatureResult.java 100.00% <100.00%> (ø)
...pensearch/ad/ratelimit/ADCheckpointReadWorker.java 100.00% <ø> (ø)
...rg/opensearch/ad/ratelimit/ADColdEntityWorker.java 100.00% <ø> (ø)
...org/opensearch/ad/ratelimit/ADColdStartWorker.java 100.00% <100.00%> (+64.28%) ⬆️
.../opensearch/ad/ratelimit/ADSaveResultStrategy.java 96.55% <ø> (+27.58%) ⬆️
.../handler/AbstractAnomalyDetectorActionHandler.java 97.82% <ø> (+77.82%) ⬆️
...tings/LegacyOpenDistroAnomalyDetectorSettings.java 100.00% <100.00%> (ø)
.../java/org/opensearch/ad/task/ADBatchTaskCache.java 96.29% <100.00%> (+0.06%) ⬆️
... and 71 more

... and 85 files with indirect coverage changes

@amitgalitz
Copy link
Member

approved, just looks like whitesource security check had issue connecting to the repo

@kaituo
Copy link
Collaborator Author

kaituo commented Aug 19, 2024

approved, just looks like whitesource security check had issue connecting to the repo

yep. Thanks.

@kaituo kaituo merged commit b5e85e1 into opensearch-project:2.x Aug 19, 2024
26 of 27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants