Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] A sufficiently small interval value on a histogram can crash the node #14558

Open
icercel opened this issue Jun 26, 2024 · 0 comments
Open
Assignees
Labels
bug Something isn't working Search:Aggregations

Comments

@icercel
Copy link

icercel commented Jun 26, 2024

Describe the bug

Provided you have a long field on your index, with extreme min and max values for it, when attempting to return a histogram aggregation on that field using a small interval value, the node instance crashes with OOM.

Related component

Search:Aggregations

To Reproduce

  1. Use the default docker-compose provided on the OS site (it's using :latest, which at the time of writing is 2.15.0)

  2. Add 2 documents

curl -k -XPUT -u "admin:$OPENSEARCH_INITIAL_ADMIN_PASSWORD" \
  'https://localhost:9200/sample-index/_doc/1' \
  -H 'Content-Type: application/json' \
  -d '{"some_value": 1}'

curl -k -XPUT -u "admin:$OPENSEARCH_INITIAL_ADMIN_PASSWORD" \
  'https://localhost:9200/sample-index/_doc/2' \
  -H 'Content-Type: application/json' \
  -d '{"some_value": 1234567890}'
  1. Attempt a histogram with a sufficiently large interval
curl -k -XGET -u "admin:$OPENSEARCH_INITIAL_ADMIN_PASSWORD" \
  'https://localhost:9200/sample-index/_search' \
  -H 'Content-Type: application/json' \
  -d '{"size":0, "aggs": { "test": { "histogram": { "field": "some_value", "interval": 300000000 }}}}'
  1. OpenSearch correctly (i think) returns the buckets:
{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "test": {
      "buckets": [
        {
          "key": 0,
          "doc_count": 1
        },
        {
          "key": 300000000,
          "doc_count": 0
        },
        {
          "key": 600000000,
          "doc_count": 0
        },
        {
          "key": 900000000,
          "doc_count": 0
        },
        {
          "key": 1200000000,
          "doc_count": 1
        }
      ]
    }
  }
}
  1. change the interval value to 1000
curl -k -XGET -u "admin:$OPENSEARCH_INITIAL_ADMIN_PASSWORD"  \
 'https://localhost:9200/sample-index/_search' \
 -H 'Content-Type: application/json' \
 -d '{"size":0, "aggs": { "test": { "histogram": { "field": "some_value", "interval": 1000 }}}}'
  1. OpenSearch correctly responds with
{
  "error": {
    "root_cause": [],
    "type": "search_phase_execution_exception",
    "reason": "",
    "phase": "fetch",
    "grouped": true,
    "failed_shards": [],
    "caused_by": {
      "type": "too_many_buckets_exception",
      "reason": "Trying to create too many buckets. Must be less than or equal to: [65535] but was [1234568]. This limit can be set by changing the [search.max_buckets] cluster level setting.",
      "max_buckets": 65535
    }
  },
  "status": 503
}
  1. change the interval to 100
curl -k -XGET -u "admin:$OPENSEARCH_INITIAL_ADMIN_PASSWORD"  \
 'https://localhost:9200/sample-index/_search' \
 -H 'Content-Type: application/json' \
 -d '{"size":0, "aggs": { "test": { "histogram": { "field": "some_value", "interval": 100 }}}}'
  1. OpenSearch responds with something like curl: (56) OpenSSL SSL_read: error:0A000126:SSL routines::unexpected eof while reading, errno 0, because opensearch-node1 just died:
opensearch-node1         | [2024-06-26T12:12:51,906][INFO ][o.o.m.j.JvmGcMonitorService] [opensearch-node1] [gc][1318] overhead, spent [366ms] collecting in the last [1.1s]
opensearch-node1         | java.lang.OutOfMemoryError: Java heap space
opensearch-node1         | Dumping heap to data/java_pid1.hprof ...
opensearch-node1         | Unable to create data/java_pid1.hprof: File exists
opensearch-node1         | [2024-06-26T12:12:52,440][ERROR][o.o.b.OpenSearchUncaughtExceptionHandler] [opensearch-node1] fatal error in thread [opensearch[opensearch-node1][search][T#24]], exiting
opensearch-node1         | java.lang.OutOfMemoryError: Java heap space
opensearch-node1         | 	at java.base/java.util.Arrays.copyOf(Arrays.java:3482) ~[?:?]
opensearch-node1         | 	at java.base/java.util.ArrayList.grow(ArrayList.java:237) ~[?:?]
opensearch-node1         | 	at java.base/java.util.ArrayList.grow(ArrayList.java:244) ~[?:?]
opensearch-node1         | 	at java.base/java.util.ArrayList.add(ArrayList.java:515) ~[?:?]
opensearch-node1         | 	at java.base/java.util.ArrayList$ListItr.add(ArrayList.java:1150) ~[?:?]
opensearch-node1         | 	at org.opensearch.search.aggregations.bucket.histogram.InternalHistogram.addEmptyBuckets(InternalHistogram.java:416) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-node1         | 	at org.opensearch.search.aggregations.bucket.histogram.InternalHistogram.reduce(InternalHistogram.java:436) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-node1         | 	at org.opensearch.search.aggregations.InternalAggregations.reduce(InternalAggregations.java:290) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-node1         | 	at org.opensearch.search.aggregations.InternalAggregations.topLevelReduce(InternalAggregations.java:225) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-node1         | 	at org.opensearch.action.search.SearchPhaseController.reduceAggs(SearchPhaseController.java:557) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-node1         | 	at org.opensearch.action.search.SearchPhaseController.reducedQueryPhase(SearchPhaseController.java:528) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-node1         | 	at org.opensearch.action.search.QueryPhaseResultConsumer.reduce(QueryPhaseResultConsumer.java:153) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-node1         | 	at org.opensearch.action.search.FetchSearchPhase.innerRun(FetchSearchPhase.java:136) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-node1         | 	at org.opensearch.action.search.FetchSearchPhase$1.doRun(FetchSearchPhase.java:122) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-node1         | 	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-node1         | 	at org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-node1         | 	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-node1         | 	at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-node1         | 	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:941) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-node1         | 	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-node1         | 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
opensearch-node1         | 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
opensearch-node1         | 	at java.base/java.lang.Thread.runWith(Thread.java:1596) ~[?:?]
opensearch-node1         | 	at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
opensearch-node1         | fatal error in thread [opensearch[opensearch-node1][search][T#24]], exiting

Expected behavior

i would have expected (liked, if possible) to get the same too_many_buckets_exception

Additional Details

Plugins
n/a

Screenshots
n/a

Host/Environment (please complete the following information):

  • OS: Ubuntu
  • Version 22.04.4

Additional context

  • the version of the OpenSearch is 2.15.0, made no changes to the docker-compose.yml

Workarounds

  • adding "min_doc_count": 1 prevents the crash (and it returns 2 buckets, key: 0 and key: 1234567800); this expects that the clients will have to reconstruct the rest of the empty buckets themselves (not always possible for my particular case, sadly)
  • changing the heap from 512m to 1024m, for example, prevents the crash for "interval": 100", but it crashes for "interval": 10"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Search:Aggregations
Projects
Status: 🆕 New
Development

No branches or pull requests

2 participants