-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rust] cache-aware DP - approx tree #1934
base: main
Are you sure you want to change the base?
Conversation
a9203e3
to
7233496
Compare
efabc82
to
42ff59e
Compare
ApproxTree { | ||
worker_urls: Vec<String>, | ||
// TODO: don't lock the whole tree | ||
url_to_tree: Arc<Mutex<HashMap<String, RadixTree>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dashmap may helps here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually i want to lock on the node of radixtree instead of the whole hashmap. Do you have any suggestion for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can start from the simple one (lock whole map). Concurrent map is quite complex. And may try out this comment later on if you are interested
https://users.rust-lang.org/t/locking-only-one-entry-in-hashmap/84764/4
}, | ||
Random { | ||
worker_urls: Vec<String>, | ||
}, | ||
ApproxTree { | ||
worker_urls: Vec<String>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If worker_urls
is mostly for read only, instead of owned string, can use Arc<str>
for thread-safe. Vec<Arc<str>>
is probably suitable in this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now Vec is read-only, but once we add dynamic scaling, it will be read+write and mostly read. Vec<Arc<str>>
looks great
1. Algorithm Overview
#1732
We propose a method to approximate worker-side radix trees on the router side using request and response information flow. The router maintains approximated trees ("approx trees") that mirror the cache state of each worker's radix tree ("worker tree").
Core Algorithm
Given N workers, the algorithm operates as follows:
The speculation phase anticipates the worker tree's future state, while the correction phase aligns the approx tree with the worker's actual cache state. This forward-backward mechanism enables continuous self-adjustment of the approximation.
Load Balancing Strategies
1. Cache Threshold with Shortest Queue
2. Variance-Based Load Balancing
Parameters
See details of google doc version here
2. Changes
py_src
main.py
: launch a minimal router server with existing workerdp_demo.py
: mimic the case like the current--dp
. Launch--dp
workers and a router. This part of code can be moved to sglang core afterwards and let sglang depends onsglang-router
. Users can installsglang-router
as an optional tosglang
.Benchmark
We can see clear improvement from the original method due to the high cache hitting.
Reference: google sheet
Follow-up
There are still many follow-ups. Notably
sglang-router
pypi)TODO
in the code`)Checklist