OOMKILL is triggered at times if throughput of events is low when storage.type is configured as Memory #9536

sasikiranvaddi · 2024-10-29T06:28:32Z

Bug Report

Describe the bug
We have inputs one harvests fluent-bit logs and the other is from the a different service /logs/gen0.log(Throughput: 10 logs/min of size 150bytes)
The mem_buf_limit is set as 500KB for both the inputs and storage.type is default i.e., memory. (more interested with storage.type as memory, as with filesystem the chunks are backed up in disk, the memory consumption looks normal in unpredictable conditions)
100Mi is set as a memory limit for the container that is running fluent-bit.

We have observed when the outputs are available, the mem_buf usage is normal and there are no issues. But when the output is down we observe OOMKILL has been triggered before mem_cache getting filled. As a workaround to prevent OOMKILL, there is a recommendation provided in the community1 to lower the mem_buf_limit, but it has inter dependency with the throughput

Is there any formula on how to calculate the mem_buf_limit, memory of the container that can be set based on throughput from the inputs. As we observe when output is down the chunks are consuming more memory when the throughput is low, but when throughput is high the chunks are still consuming memory but mem_buf_limit is hitting faster, due to that OOMkill is not triggered(since it fills mem_buf fastly and stops ingestion from that input).
What should be the ideal values for fluent-bit container to survive in all conditions.
If mem_buf_limit is configured to a lower limit and if one of the event crosses the limit then the logs will not be transmitted. How can we determine limits considering this scenario?

For the above queries, there is no clear wayforward on how to handle it.

Your Environment

Version used: 3.0.5
Configuration:
Fluent-bit.conf
bash-4.4$ cat /etc/fluent-bit/fluent-bit.conf
@include /etc/fluent-bit/inputs.conf
@include /etc/fluent-bit/outputs.conf
@include /etc/fluent-bit/filters.conf

[SERVICE]
    flush           3
    grace           10
    log_level       info
    parsers_file    /etc/fluent-bit/parsers.conf
    http_server     on
    http_listen     localhost
    http_port       2020
    storage.metrics on

Inputs.conf

[INPUT]
    name              tail
    tag               event.fluent-bit
    alias             fluent-bit
    buffer_chunk_size 32k
    buffer_max_size   32k
    path              /logs/fluent-bit.log
    path_key          filename
    read_from_head    true
    refresh_interval  5
    rotate_wait       10
    skip_empty_lines  off
    skip_long_lines   off
    key               message
    db                /logs/fluent-bit.db
    db.sync           normal
    db.locking        true
    db.journal_mode   off
    parser            json
    mem_buf_limit     500KB
[INPUT]
    name              tail
    tag               event.file0
    alias             file0
    buffer_chunk_size 32k
    buffer_max_size   32k
    read_from_head    true
    refresh_interval  5
    rotate_wait       10
    skip_empty_lines  off
    skip_long_lines   off
    key               message
    db                /logs/file0.db
    db.sync           normal
    db.locking        true
    db.journal_mode   off
    path              /logs/gen0.log
    path_key          filename
    exclude_path      /logs/fluent-bit.log
    mem_buf_limit     500KB

Additional context
There is a no clear wayforward or workaround to prevent OOMKILL

The text was updated successfully, but these errors were encountered:

sasikiranvaddi added the status: waiting-for-triage label Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OOMKILL is triggered at times if throughput of events is low when storage.type is configured as Memory #9536

OOMKILL is triggered at times if throughput of events is low when storage.type is configured as Memory #9536

sasikiranvaddi commented Oct 29, 2024

OOMKILL is triggered at times if throughput of events is low when storage.type is configured as Memory #9536

OOMKILL is triggered at times if throughput of events is low when storage.type is configured as Memory #9536

Comments

sasikiranvaddi commented Oct 29, 2024

Bug Report