RAM leakage in blobfuse2 > 2.3.0 #1639

Vegoo89 · 2025-02-19T14:35:06Z

Hello,
Related to previously closed issue: #1617

We are using blobfuse2 on AKS via CSI blob driver, latest version.

Few days ago we upgraded node pool to latest image and it automatically installed blobfuse version 2.4.0

When we ran our performance tests, nodes started to transition to NotReady state after very short period of time.

After conducting debug session, we realized blobfuse2 is not freeing RAM at all. It just keep growing until host becomes unresponsive due to lack of memory.

I conducted tests on following blobfuse2 versions:

2.3.0 -> no issue
2.3.2 -> issue persist
2.4.0 -> issue persist
2.4.1 -> issue persist

We are using Blob CSI via PV (RBAC UAMI auth) using below mount options:

mountOptions:
  - '-o allow_other'
  - '--file-cache-timeout-in-seconds=0'
  - '-o attr_timeout=0'
  - '-o entry_timeout=0'
  - '-o negative_timeout=0'
  - '--attr-timeout=0'
  - '--entry-timeout=0''
  - '--cancel-list-on-mount-seconds=10'
  - '--block-cache

We are using blob cache due to file cache limit (it doesn't clean up folders and inode limit is reached)

We tried -o direct_io but performance was very poor and not acceptable by our SLA

Any suggestions are welcome, thanks!

The text was updated successfully, but these errors were encountered:

vibhansa-msft · 2025-02-19T16:10:11Z

How many files do you have in your container which shall list as part of this mount ?

Vegoo89 · 2025-02-20T09:13:02Z

Hi,
Each node has 3 blobfuse2 processes running that are mapped to different containers.

Per one test (around 4000 tasks), we are producing ~10 small files per task with 800 tasks per minute ratio

One container is used for writing and reading of small files, mostly JSONs (reads are performed only after write is done, there is no integrity issue).
Second and third hold machine learning models (transformer models) and adapters.
Basic model is loaded once and kept in memory of 3 microservices, while adapters - which are very small - are loaded and switched constantly.

vibhansa-msft · 2025-02-21T06:39:47Z

As you are using block-cache and you have not set any upper limit on the cache usage, each blobfuse instance running will try to use 80% of the available memory by default. Dur to multiple instnaces your memory is running low. You need to set a max memory usage limit for blockc-cahe using --block-cache-pool-size=<mem size in MB> cli parameter.

Vegoo89 · 2025-02-21T19:55:21Z

I thought about it, but for version 2.3.0 limit was set to 4GB and process was using constantly 500-600 MB memory under load and it never reached this 4GB limit.
Now memory keeps growing - for the same load - up to 20GB per process, so I can limit it to some value (no idea what would be correct one) but I doubt it is root of the problem.

vibhansa-msft · 2025-02-22T04:30:05Z

With autoconfig we reserve 80% of memory space in single instance, assuming all resources are at our dispersal. If you are running multiple instances then you will need to manually restrict the memory usage. This might solve the problem. Try it out and let us know if that helps.

Vegoo89 · 2025-02-24T12:16:33Z

We are testing --block-cache-pool-size=2000 and should have results today or tomorrow.

However it is a bit concerning that this limit is out of scope on Kubernetes.

How do I make sure that my node doesn't die due to OOM when blobfuse2 process is executed outside of the container, directly on the host so it doesn't respect spec.resources.limits.memory set by blob-csi daemonset?

vibhansa-msft · 2025-02-24T13:41:18Z

By default it asks OS about the total memory available on the system and then takes 80% of it. So if there are multiple instances are started in parallel then all may end up having the same value and thus they may end up over running the total limits. Restricting it manually might be the only option here I feel.

Vegoo89 · 2025-02-24T18:02:36Z

Ok I get it, question is why does it keeps allocating so much memory and doesn't release any in newest versions.

In version 2.3.0 memory usage - without setting --block-cache-pool-size peaked at 600MB.

With 2.3.2 and above it just keep growing.

Should I expect significant performance issues if I set all PV to --block-cache-pool-size=600 ?

Was there some different memory management for block cache back in 2.3.0 that was removed or remade later on?

vibhansa-msft · 2025-02-25T02:24:40Z

In either version it shall not continuously rise. At certain point it shall stablize. Some of the memory related issues we have found recently are due to Go version upgrade and we are actively working on those.

syeleti-msft · 2025-02-25T04:53:31Z

In version 2.3.0 memory usage - without setting --block-cache-pool-size peaked at 600MB.

With 2.3.2 and above it just keep growing.

Yes you are right, After 2.3.0 we made some changes on our memory management, when we reuse the buffer, in 2.3.0 we dont used to clear the buffer before using it(This might cause some data integrity issues) hence in >2.3.0 versions we try to clear the buffer by copying a zero buffer into the existing buffer. This is the reason why you see all the memory being used in the system.
Currently in the latest release when you set the memory pool, the whole memory is used and only deallocated when the blobfuse terminates. This is a significant bottleneck and we are working on it to imporve the memory management in block cache.

Please refrain from using 2.3.0, there are many known data integrity issues, I suggest using the latest release.

Vegoo89 · 2025-02-25T08:44:07Z

In either version it shall not continuously rise. At certain point it shall stablize. Some of the memory related issues we have found recently are due to Go version upgrade and we are actively working on those.

In our use case it doesn't stabilize for >2.3.0. It grows to ~20GB per process and causes node to become unresponsive so AKS moves it to NotReady state and removes it from the pool.

In version 2.3.0 memory usage - without setting --block-cache-pool-size peaked at 600MB.

With 2.3.2 and above it just keep growing.

Yes you are right, After 2.3.0 we made some changes on our memory management, when we reuse the buffer, in 2.3.0 we dont used to clear the buffer before using it(This might cause some data integrity issues) hence in >2.3.0 versions we try to clear the buffer by copying a zero buffer into the existing buffer. This is the reason why you see all the memory being used in the system.

Currently in the latest release when you set the memory pool, the whole memory is used and only deallocated when the blobfuse terminates. This is a significant bottleneck and we are working on it to imporve the memory management in block cache.

Please refrain from using 2.3.0, there are many known data integrity issues, I suggest using the latest release.

Thanks for explanation, appreciate it. Just last one thing I would love to understand is how setting --block-cache-pool-size=600 would affect performance.

We are working mostly with small files. Would setting this value to higher number be required if we would operate on blobs bigger than 600 MB? I am trying to understand what would work best for our use case and what impact this setting has on performance of the node (considering we have few blobfuse2 processes running there).

syeleti-msft · 2025-02-25T09:09:49Z

We are working mostly with small files. Would setting this value to higher number be required if we would operate on blobs bigger than 600 MB?

No, doing operations on bigger blobs dont need more memory size

Do you see any perf difference when using 600M as memsize for your usecase in the latest release?

Vegoo89 · 2025-02-25T21:43:13Z

We did few rounds of testing today. Each performance test takes around 15 minutes and produces around 120k files.

2.3.0 -> no --block-cache-pool-size -> average CPU utilization 58% per node
2.4.1 -> --block-cache-pool-size=2000 -> average CPU utilization 50% per node
2.4.1 -> --block-cache-pool-size=600 -> average CPU utilization 54% per node

We will also perform more tests in upcoming days on bigger nodes, but results look good.

A note from my observation for version 2.4.1 - after tests are finished each blobfuse2 process stays at its limit and doesn't release memory. Also it allocates around ~23MB over the limit.

However logic for default limit of 80% host memory seems a bit off. I am not Go expert, but I checked the code and I don't see any guard that prevents OOM on the system in situation similar to described above.

In containerized environment it most likely would happen on all nodes at the same time since traffic is distributed evenly in most commons scenarios and it blows up entire worker node pool. Cluster recovery in our case takes up to 1 hour.

syeleti-msft · 2025-02-26T04:58:26Z

However logic for default limit of 80% host memory seems a bit off. I am not Go expert, but I checked the code and I don't see any guard that prevents OOM on the system in situation similar to described above.

Constraints were placed in the code taking the fact that only one blobfuse instance would run per VM but after looking into your scenario, I think we should reconsider some things. Thanks for pointing out.

Vegoo89 · 2025-02-26T08:48:05Z

Thanks! Would be great, since - best to my knowledge - it is considered best practice to split data between multiple containers instead of putting everything to single one and it means running more than one blobfuse process on the node.

vibhansa-msft added this to the v2-2.5.0 milestone Feb 19, 2025

vibhansa-msft added the V2 label Feb 19, 2025

Vegoo89 mentioned this issue Feb 20, 2025

Add support for --allow-downgrades flag in init container kubernetes-sigs/blob-csi-driver#1846

Open

Vegoo89 mentioned this issue Feb 24, 2025

Blob CSI works outside of resource memory limits boundaries kubernetes-sigs/blob-csi-driver#1847

Open

syeleti-msft self-assigned this Feb 25, 2025

syeleti-msft added performance block-cache labels Feb 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RAM leakage in blobfuse2 > 2.3.0 #1639

RAM leakage in blobfuse2 > 2.3.0 #1639

Vegoo89 commented Feb 19, 2025

vibhansa-msft commented Feb 19, 2025

Vegoo89 commented Feb 20, 2025 •

edited

Loading

vibhansa-msft commented Feb 21, 2025

Vegoo89 commented Feb 21, 2025 •

edited

Loading

vibhansa-msft commented Feb 22, 2025

Vegoo89 commented Feb 24, 2025 •

edited

Loading

vibhansa-msft commented Feb 24, 2025

Vegoo89 commented Feb 24, 2025

vibhansa-msft commented Feb 25, 2025

syeleti-msft commented Feb 25, 2025

Vegoo89 commented Feb 25, 2025 •

edited

Loading

syeleti-msft commented Feb 25, 2025

Vegoo89 commented Feb 25, 2025

syeleti-msft commented Feb 26, 2025

Vegoo89 commented Feb 26, 2025

RAM leakage in blobfuse2 > 2.3.0 #1639

RAM leakage in blobfuse2 > 2.3.0 #1639

Comments

Vegoo89 commented Feb 19, 2025

vibhansa-msft commented Feb 19, 2025

Vegoo89 commented Feb 20, 2025 • edited Loading

vibhansa-msft commented Feb 21, 2025

Vegoo89 commented Feb 21, 2025 • edited Loading

vibhansa-msft commented Feb 22, 2025

Vegoo89 commented Feb 24, 2025 • edited Loading

vibhansa-msft commented Feb 24, 2025

Vegoo89 commented Feb 24, 2025

vibhansa-msft commented Feb 25, 2025

syeleti-msft commented Feb 25, 2025

Vegoo89 commented Feb 25, 2025 • edited Loading

syeleti-msft commented Feb 25, 2025

Vegoo89 commented Feb 25, 2025

syeleti-msft commented Feb 26, 2025

Vegoo89 commented Feb 26, 2025

Vegoo89 commented Feb 20, 2025 •

edited

Loading

Vegoo89 commented Feb 21, 2025 •

edited

Loading

Vegoo89 commented Feb 24, 2025 •

edited

Loading

Vegoo89 commented Feb 25, 2025 •

edited

Loading