Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

execute future in tokio::spawn causes more memory consumption. #7064

Closed
dream-1ab opened this issue Jan 2, 2025 · 8 comments
Closed

execute future in tokio::spawn causes more memory consumption. #7064

dream-1ab opened this issue Jan 2, 2025 · 8 comments
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug.

Comments

@dream-1ab
Copy link

dream-1ab commented Jan 2, 2025

Version
Rust: rustc 1.83.0 (90b35a623 2024-11-26)

PS /media/dreamlab/Development/Project/meshel_customer_service/backend/new/customer_service/target/release> cargo tree | grep tokio
│   │   └── tokio v1.42.0
│   │       └── tokio-macros v2.4.0 (proc-macro)
│   │   ├── tokio v1.42.0 (*)
│   ├── tokio v1.42.0 (*)
│   ├── tokio-tungstenite v0.26.1
│   │   ├── tokio v1.42.0 (*)
│   │   ├── tokio v1.42.0 (*)
│   │   ├── tokio-util v0.7.13
│   │   │   └── tokio v1.42.0 (*)
│   │   └── tokio v1.42.0 (*)
│   │   └── tokio v1.42.0 (*)
│   ├── tokio v1.42.0 (*)
│   ├── tokio-rustls v0.26.1
│   │   └── tokio v1.42.0 (*)
├── tokio v1.42.0 (*)
    │   ├── tokio v1.42.0 (*)
    ├── tokio v1.42.0 (*)
    ├── tokio-util v0.7.13 (*)

Platform
The output of uname -a (UNIX), or version and 32 or 64-bit (Windows)

Linux dreamlab-xiaomibookpro162022 6.5.0-1024-oem #25-Ubuntu SMP PREEMPT_DYNAMIC Mon May 20 14:47:48 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Description

I'm working on a server-side project uses rust + tokio + tower + axum.
suddenly I noticed that my hello world axum http server takes almost 100MB of ram when I run load test with ab -c 1000 -n 500000 http://0.0.0.0:7999/
and finally, I found why simple hello world program takes 100MB of ram.
this is caused when I run server initialization inside of tokio::spawn.
it consumes 10x more memory than running without tokio::spawn.

I tried this code:

main.rs

use std::{future::Future, time::Duration};

use axum::{http::{HeaderName, HeaderValue}, routing::get, Router};
// use axum_extensions::request_counter::RequestCounter;
use tower_http::cors::Any;

mod axum_extensions;

#[tokio::main]
async fn main() {
    let (host, port) = (
        std::env::var("SERVER_HOST").unwrap_or("0.0.0.0".to_string()),
        std::env::var("SERVER_PORT").unwrap_or("7999".to_string()).parse().unwrap_or(7999),
    );

    let app = Router::new()
        .route("/", get(|| async {
            "hello world"
        }))
        // .layer(RequestCounter::new())
        .layer(tower_http::cors::CorsLayer::new().allow_headers(Any).allow_methods(Any).allow_origin(Any))
        .layer(tower_http::set_header::SetResponseHeaderLayer::appending(HeaderName::from_static("developer"), HeaderValue::from_static("Meshel DreamLab software technologies")))
        .layer(tower_http::set_header::SetResponseHeaderLayer::appending(HeaderName::from_static("server"), HeaderValue::from_static("Rust + Tokio + Hyper + Axum")))
    ;

    let task = async move {
        let tcp_server = tokio::net::TcpListener::bind((host, port)).await.unwrap();
        axum::serve(tcp_server, app).await.unwrap();
    };
    
//****************OVER HERE, Switch comment/comment out those two function to try.*******************
    // run_without_spawn(task).await;
    run_with_spawn(task).await;
//***************************************************
}

async fn run_without_spawn<T>(future: impl Future<Output = T>) {
    future.await;
}

async fn run_with_spawn<T: Send + 'static>(future: impl Future<Output = T> + Send + 'static) {
    tokio::spawn(future).await.unwrap();
}

Cargo.toml

[package]
name = "customer_service"
version = "0.1.0"
edition = "2021"

[dependencies]
axum = { version = "0.8.1", features = ["ws"] }
neo4rs = { version = "0.8.0", features = ["json", "serde_json"] }
serde = { version = "1.0.217", features = ["derive"] }
serde_json = "1.0.134"
tokio = { version = "1.42.0", features = ["full"] }
tower = { version = "0.5.2", features = ["full"] }
tower-http = { version = "0.6.2", features = ["full"] }

[code sample that causes the bug]
You can cause the same problem by commenting out ``// run_with_spawn(task).await;and commentrun_without_spawn(task).await;`

    //run_without_spawn(task).await;
    run_with_spawn(task).await;

here is expected result: (without using tokio::spawn)

Screenshot_20250103_021748

here is screenshot after 500,000 times load test with ab -c 1000 -n 500000 http://0.0.0.0:7999/ when use tokio::spawn

Screenshot_20250103_021701

and memory never goes back to normal (meaning around 10MB).

@dream-1ab dream-1ab added A-tokio Area: The main tokio crate C-bug Category: This is a bug. labels Jan 2, 2025
@Darksonn
Copy link
Contributor

Darksonn commented Jan 3, 2025

Please try to measure the memory using this utility:

use core::sync::atomic::{AtomicUsize, Ordering::Relaxed};
use std::alloc::{GlobalAlloc, Layout, System};

struct TrackedAlloc {}

#[global_allocator]
static ALLOC: TrackedAlloc = TrackedAlloc;

static TOTAL_MEM: AtomicUsize = AtomicUsize::new(0);

unsafe impl GlobalAlloc for TrackedAlloc {
    unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
        let ret = System.alloc(layout);
        if !ret.is_null() {
            TOTAL_MEM.fetch_add(layout.size(), Relaxed);
        }
        ret
    }
    unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
        TOTAL_MEM.fetch_sub(layout.size(), Relaxed);
        System.dealloc(ptr, layout);
    }

    unsafe fn alloc_zeroed(&self, layout: Layout) -> *mut u8 {
        let ret = System.alloc_zeroed(layout);
        if !ret.is_null() {
            TOTAL_MEM.fetch_add(layout.size(), Relaxed);
        }
        ret
    }
    unsafe fn realloc(&self, ptr: *mut u8, layout: Layout, new_size: usize) -> *mut u8 {
        let ret = System.realloc(ptr, layout, new_size);
        if !ret.is_null() {
            TOTAL_MEM.fetch_add(new_size.wrapping_sub(layout.size()), Relaxed);
        }
        ret
    }
}

@dream-1ab
Copy link
Author

Please try to measure the memory using this utility:

use core::sync::atomic::{AtomicUsize, Ordering::Relaxed};
use std::alloc::{GlobalAlloc, Layout, System};

struct TrackedAlloc {}

#[global_allocator]
static ALLOC: TrackedAlloc = TrackedAlloc;

static TOTAL_MEM: AtomicUsize = AtomicUsize::new(0);

unsafe impl GlobalAlloc for TrackedAlloc {
    unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
        let ret = System.alloc(layout);
        if !ret.is_null() {
            TOTAL_MEM.fetch_add(layout.size(), Relaxed);
        }
        ret
    }
    unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
        TOTAL_MEM.fetch_sub(layout.size(), Relaxed);
        System.dealloc(ptr, layout);
    }

    unsafe fn alloc_zeroed(&self, layout: Layout) -> *mut u8 {
        let ret = System.alloc_zeroed(layout);
        if !ret.is_null() {
            TOTAL_MEM.fetch_add(layout.size(), Relaxed);
        }
        ret
    }
    unsafe fn realloc(&self, ptr: *mut u8, layout: Layout, new_size: usize) -> *mut u8 {
        let ret = System.realloc(ptr, layout, new_size);
        if !ret.is_null() {
            TOTAL_MEM.fetch_add(new_size.wrapping_sub(layout.size()), Relaxed);
        }
        ret
    }
}

without tokio::spawn:

242440 bytes
236 kilobytes
0 megabytes

with tokio::spawn:

242696 bytes
237 kilobytes
0 megabytes

results are identical, so why tokio::spawn consumes more memory in system monitor? is this because memory fragmentation or memory allocator cache?

@Darksonn
Copy link
Contributor

Darksonn commented Jan 3, 2025

Memory allocators often hold on to memory you are not using so that future allocations are faster. That's most likely what is happening. Of course, fragmentation could also be a factor. Have you tried with jemalloc?

@dream-1ab
Copy link
Author

it's okay if small amount of memory is cached by memory allocator for future use but 100MB is not small amount of memory, I think.
in my case my system memory is sufficient so this is may not a problem but what will happens if system memory is not enough?
in this case does system triggers low memory signal to all running applications so applications will release memory they holding and not using currently?

I will try the same thing with jemalloc.

@dream-1ab
Copy link
Author

Memory allocators often hold on to memory you are not using so that future allocations are faster. That's most likely what is happening. Of course, fragmentation could also be a factor. Have you tried with jemalloc?

with the following modification:

use std::{future::Future, time::Duration};

use axum::{http::{HeaderName, HeaderValue}, routing::get, Router};
use axum_extensions::request_counter::RequestCounter;
use routes::accounts::account_management_route;
use tower_http::cors::Any;

mod axum_extensions;
mod routes;


use core::sync::atomic::{AtomicUsize, Ordering::Relaxed};
use std::alloc::{GlobalAlloc, Layout, System};

static _JEMALLOC: jemallocator::Jemalloc = jemallocator::Jemalloc{};

struct TrackedAlloc;

#[global_allocator]
static ALLOC: TrackedAlloc = TrackedAlloc;

static TOTAL_MEM: AtomicUsize = AtomicUsize::new(0);

unsafe impl GlobalAlloc for TrackedAlloc {
    unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
        let ret = _JEMALLOC.alloc(layout);
        if !ret.is_null() {
            TOTAL_MEM.fetch_add(layout.size(), Relaxed);
        }
        ret
    }
    unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
        TOTAL_MEM.fetch_sub(layout.size(), Relaxed);
        _JEMALLOC.dealloc(ptr, layout);
    }

    unsafe fn alloc_zeroed(&self, layout: Layout) -> *mut u8 {
        let ret = _JEMALLOC.alloc_zeroed(layout);
        if !ret.is_null() {
            TOTAL_MEM.fetch_add(layout.size(), Relaxed);
        }
        ret
    }
    unsafe fn realloc(&self, ptr: *mut u8, layout: Layout, new_size: usize) -> *mut u8 {
        let ret = _JEMALLOC.realloc(ptr, layout, new_size);
        if !ret.is_null() {
            TOTAL_MEM.fetch_add(new_size.wrapping_sub(layout.size()), Relaxed);
        }
        ret
    }
}


#[tokio::main]
async fn main() {
    let (host, port) = (
        std::env::var("SERVER_HOST").unwrap_or("0.0.0.0".to_string()),
        std::env::var("SERVER_PORT").unwrap_or("7999".to_string()).parse().unwrap_or(7999),
    );

    let app = Router::new()
        .route("/", get(|| async {
            "Welcome to our customer service."
        }))
        .route("/memory", get(|| async {
            let bytes_of_mem = TOTAL_MEM.load(Relaxed);
            format!("{} bytes = {} kilobytes = {} megabytes", bytes_of_mem, bytes_of_mem / 1024, bytes_of_mem / 1024 / 1024)
        }))
        .nest("/api/v1", Router::new()
            .nest("/account", account_management_route().await)
        )
        .layer(RequestCounter::new())
        .layer(tower_http::cors::CorsLayer::new().allow_headers(Any).allow_methods(Any).allow_origin(Any))
        .layer(tower_http::set_header::SetResponseHeaderLayer::appending(HeaderName::from_static("developer"), HeaderValue::from_static("Meshel DreamLab software technologies")))
        .layer(tower_http::set_header::SetResponseHeaderLayer::appending(HeaderName::from_static("server"), HeaderValue::from_static("Rust + Tokio + Hyper + Axum")))
    ;

    let task = async move {
        let tcp_server = tokio::net::TcpListener::bind((host, port)).await.unwrap();
        axum::serve(tcp_server, app).await.unwrap();
    };
    
    tokio::spawn(task).await.unwrap();
}

jemalloc with tokio::spawn:

250928 bytes = 245 kilobytes = 0 megabytes

and plasma system monitor:

Screenshot_20250103_185429

jemalloc without tokio::spawn

250352 bytes = 244 kilobytes = 0 megabytes

plasma system monitor:

Screenshot_20250103_185631

@Darksonn
Copy link
Contributor

Darksonn commented Jan 3, 2025

Jemalloc does give cached memory back to the OS, but only after a delay. And it doesn't happen if the application is completely idle. You can try configuring jemalloc with background_thread:true,tcache_max:4096 to let freeing of cached memory happen in the background even if the application isn't calling into the allocator. See jemalloc's tuning page for more info.

@dream-1ab
Copy link
Author

Jemalloc does give cached memory back to the OS, but only after a delay. And it doesn't happen if the application is completely idle. You can try configuring jemalloc with background_thread:true,tcache_max:4096 to let freeing of cached memory happen in the background even if the application isn't calling into the allocator. See jemalloc's tuning page for more info.

after background_thread cargo feature is enabled of kitv-jemallocator, during load testing, memory consumption goes to 200MB+ and goes back to 15MB~ within 10 seconds after load test is ending.

in my case this is totally acceptable, and this doesn't happen during normal execution.
thank you.

@Darksonn
Copy link
Contributor

Darksonn commented Jan 3, 2025

You're welcome.

@Darksonn Darksonn closed this as completed Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug.
Projects
None yet
Development

No branches or pull requests

2 participants