[BUG] A sufficiently small interval value on a histogram can crash the node #14558

icercel · 2024-06-26T12:35:11Z

Describe the bug

Provided you have a long field on your index, with extreme min and max values for it, when attempting to return a histogram aggregation on that field using a small interval value, the node instance crashes with OOM.

Related component

Search:Aggregations

To Reproduce

Use the default docker-compose provided on the OS site (it's using :latest, which at the time of writing is 2.15.0)
Add 2 documents

curl -k -XPUT -u "admin:$OPENSEARCH_INITIAL_ADMIN_PASSWORD" \
  'https://localhost:9200/sample-index/_doc/1' \
  -H 'Content-Type: application/json' \
  -d '{"some_value": 1}'

curl -k -XPUT -u "admin:$OPENSEARCH_INITIAL_ADMIN_PASSWORD" \
  'https://localhost:9200/sample-index/_doc/2' \
  -H 'Content-Type: application/json' \
  -d '{"some_value": 1234567890}'

Attempt a histogram with a sufficiently large interval

curl -k -XGET -u "admin:$OPENSEARCH_INITIAL_ADMIN_PASSWORD" \
  'https://localhost:9200/sample-index/_search' \
  -H 'Content-Type: application/json' \
  -d '{"size":0, "aggs": { "test": { "histogram": { "field": "some_value", "interval": 300000000 }}}}'

OpenSearch correctly (i think) returns the buckets:

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "test": {
      "buckets": [
        {
          "key": 0,
          "doc_count": 1
        },
        {
          "key": 300000000,
          "doc_count": 0
        },
        {
          "key": 600000000,
          "doc_count": 0
        },
        {
          "key": 900000000,
          "doc_count": 0
        },
        {
          "key": 1200000000,
          "doc_count": 1
        }
      ]
    }
  }
}

change the interval value to 1000

curl -k -XGET -u "admin:$OPENSEARCH_INITIAL_ADMIN_PASSWORD"  \
 'https://localhost:9200/sample-index/_search' \
 -H 'Content-Type: application/json' \
 -d '{"size":0, "aggs": { "test": { "histogram": { "field": "some_value", "interval": 1000 }}}}'

OpenSearch correctly responds with

{
  "error": {
    "root_cause": [],
    "type": "search_phase_execution_exception",
    "reason": "",
    "phase": "fetch",
    "grouped": true,
    "failed_shards": [],
    "caused_by": {
      "type": "too_many_buckets_exception",
      "reason": "Trying to create too many buckets. Must be less than or equal to: [65535] but was [1234568]. This limit can be set by changing the [search.max_buckets] cluster level setting.",
      "max_buckets": 65535
    }
  },
  "status": 503
}

change the interval to 100

curl -k -XGET -u "admin:$OPENSEARCH_INITIAL_ADMIN_PASSWORD"  \
 'https://localhost:9200/sample-index/_search' \
 -H 'Content-Type: application/json' \
 -d '{"size":0, "aggs": { "test": { "histogram": { "field": "some_value", "interval": 100 }}}}'

OpenSearch responds with something like curl: (56) OpenSSL SSL_read: error:0A000126:SSL routines::unexpected eof while reading, errno 0, because opensearch-node1 just died:

opensearch-node1         | [2024-06-26T12:12:51,906][INFO ][o.o.m.j.JvmGcMonitorService] [opensearch-node1] [gc][1318] overhead, spent [366ms] collecting in the last [1.1s]
opensearch-node1         | java.lang.OutOfMemoryError: Java heap space
opensearch-node1         | Dumping heap to data/java_pid1.hprof ...
opensearch-node1         | Unable to create data/java_pid1.hprof: File exists
opensearch-node1         | [2024-06-26T12:12:52,440][ERROR][o.o.b.OpenSearchUncaughtExceptionHandler] [opensearch-node1] fatal error in thread [opensearch[opensearch-node1][search][T#24]], exiting
opensearch-node1         | java.lang.OutOfMemoryError: Java heap space
opensearch-node1         | 	at java.base/java.util.Arrays.copyOf(Arrays.java:3482) ~[?:?]
opensearch-node1         | 	at java.base/java.util.ArrayList.grow(ArrayList.java:237) ~[?:?]
opensearch-node1         | 	at java.base/java.util.ArrayList.grow(ArrayList.java:244) ~[?:?]
opensearch-node1         | 	at java.base/java.util.ArrayList.add(ArrayList.java:515) ~[?:?]
opensearch-node1         | 	at java.base/java.util.ArrayList$ListItr.add(ArrayList.java:1150) ~[?:?]
opensearch-node1         | 	at org.opensearch.search.aggregations.bucket.histogram.InternalHistogram.addEmptyBuckets(InternalHistogram.java:416) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-node1         | 	at org.opensearch.search.aggregations.bucket.histogram.InternalHistogram.reduce(InternalHistogram.java:436) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-node1         | 	at org.opensearch.search.aggregations.InternalAggregations.reduce(InternalAggregations.java:290) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-node1         | 	at org.opensearch.search.aggregations.InternalAggregations.topLevelReduce(InternalAggregations.java:225) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-node1         | 	at org.opensearch.action.search.SearchPhaseController.reduceAggs(SearchPhaseController.java:557) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-node1         | 	at org.opensearch.action.search.SearchPhaseController.reducedQueryPhase(SearchPhaseController.java:528) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-node1         | 	at org.opensearch.action.search.QueryPhaseResultConsumer.reduce(QueryPhaseResultConsumer.java:153) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-node1         | 	at org.opensearch.action.search.FetchSearchPhase.innerRun(FetchSearchPhase.java:136) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-node1         | 	at org.opensearch.action.search.FetchSearchPhase$1.doRun(FetchSearchPhase.java:122) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-node1         | 	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-node1         | 	at org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-node1         | 	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-node1         | 	at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-node1         | 	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:941) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-node1         | 	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-node1         | 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
opensearch-node1         | 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
opensearch-node1         | 	at java.base/java.lang.Thread.runWith(Thread.java:1596) ~[?:?]
opensearch-node1         | 	at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
opensearch-node1         | fatal error in thread [opensearch[opensearch-node1][search][T#24]], exiting

Expected behavior

i would have expected (liked, if possible) to get the same too_many_buckets_exception

Additional Details

Plugins
n/a

Screenshots
n/a

Host/Environment (please complete the following information):

OS: Ubuntu
Version 22.04.4

Additional context

the version of the OpenSearch is 2.15.0, made no changes to the docker-compose.yml

Workarounds

adding "min_doc_count": 1 prevents the crash (and it returns 2 buckets, key: 0 and key: 1234567800); this expects that the clients will have to reconstruct the rest of the empty buckets themselves (not always possible for my particular case, sadly)
changing the heap from 512m to 1024m, for example, prevents the crash for "interval": 100", but it crashes for "interval": 10"

The text was updated successfully, but these errors were encountered:

icercel added bug Something isn't working untriaged labels Jun 26, 2024

github-actions bot added the Search:Aggregations label Jun 26, 2024

mch2 assigned bowenlan-amzn Jun 26, 2024

bowenlan-amzn removed the untriaged label Jun 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] A sufficiently small interval value on a histogram can crash the node #14558

[BUG] A sufficiently small interval value on a histogram can crash the node #14558

icercel commented Jun 26, 2024

[BUG] A sufficiently small interval value on a histogram can crash the node #14558

[BUG] A sufficiently small interval value on a histogram can crash the node #14558

Comments

icercel commented Jun 26, 2024

Describe the bug

Related component

To Reproduce

Expected behavior

Additional Details