-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add thread manager crate to agave #3890
base: master
Are you sure you want to change the base?
Conversation
da038b3
to
309350b
Compare
@@ -48,11 +48,12 @@ fn main() -> anyhow::Result<()> { | |||
println!("Running {exp}"); | |||
let mut conffile = PathBuf::from(env!("CARGO_MANIFEST_DIR")); | |||
conffile.push(exp); | |||
let conffile = std::fs::File::open(conffile)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically, we don't specify configurations in files. Doing it in code instead for the example/test cases, or with command line flags for other cases. Sometimes use yml (for accounts information for example).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got you, you want to read thread-manager/examples/core_contention_contending_set.toml
with it.
Yeah, could you maybe have a method that reads this file and used in both example and production environment in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file will never be read in prod, its just for examples which are also benchmarks/tests. I'm not even sure we will use toml in prod for this yet, though it is likely.
I think you need to explain broader audience proposed changes in the context. Maybe by setting up a meeting with involved parties. |
Yes, a meeting would be necessary eventually, but I also need the code to be reasonably good before a million people look at it=) |
2c6a9dd
to
4e1d728
Compare
c93751c
to
60a04f8
Compare
895fd15
to
e00029a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the high level idea. Left a few nits and some questions.
I need to better understand how you envision the string-->string-->runtime mapping working before I can weigh in on it
## Scheduling policy and priority | ||
If you want you can set thread scheduling policy and priority. Keep in mind that this will likely require | ||
```bash | ||
sudo setcap cap_sys_nice+ep |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the past we've had tuning scripts that could handle recommended configurations like this. If it makes sense, we could add this wherever relevant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about priority stuff at this point, it is super platform specific and mostly a placeholder at the moment, as getting this past CI will be unfun (due to extra priveleges necessary)
thread-manager/src/lib.rs
Outdated
#[derive(Default, Debug)] | ||
pub struct ThreadManagerInner { | ||
pub tokio_runtimes: HashMap<ConstString, TokioRuntime>, | ||
pub tokio_runtime_mapping: HashMap<ConstString, ConstString>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the idea here that we essentially have a mapping like so:
service --> tokio name --> tokio runtime
And service --> tokio name could many to 1 but the tokio name --> tokio runtime is 1 to 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, exactly! I believe we will eventually have maybe 1-2 tokio runtimes + 2-3 rayons. and mappings to those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool! I think it makes sense. Maybe we can get rid of the intermediate mapping and go just from service to runtime (I believe you mentioned this elsewhere)
) -> Option<&'a T> { | ||
match mapping.get(name) { | ||
Some(n) => runtimes.get(n), | ||
None => match mapping.get("default") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we could check some of these mappings up front since we only create/map these thread pools one time at startup. This might make runtime simpler.
For example, when creating ThreadManager, we iterate through the mappings, and if string1 maps to a non-existent string2, we log a warning and then map it to "default".
Then maybe we could throw an error here if we try to get a runtime that doesn't exist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we could, but then we'd need a complete list of all needed runtimes which would be a nightmare to maintain... Mapping to "default" is definitely doable.
thread-manager/src/lib.rs
Outdated
} | ||
|
||
/* #[test] | ||
fn thread_priority() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this one commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs root/extra capabilities + realtime kernel features to really be useful. So it is set aside for "future use" for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be nice to add that context into the code. We could also just mark this #[ignore]
Adding @alessandrod as I'm sure he'll have opinions on this 😄 |
From this idea we get to solGossipConsume -> RayonSigVerify -> ThreadPool sort of mapping. At least that was the idea. We can dispense with the intermediate layer but then each pool would necessarily run on its own set of threads which may not be ideal. We can, of course, axe this functionality and bring it back later as appropriate. |
} | ||
|
||
/// Makes test runtime with 2 threads, only for unittests | ||
pub fn new_for_tests() -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#[cfg(test)]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, can not be done as it is to be used from other crates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this use case, our convention, instead, to use
#[cfg(feature = "dev-context-only-utils")]
and to Cargo file
[features]
dev-context-only-utils = []
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
thread-manager/src/lib.rs
Outdated
rayon_runtime::{RayonConfig, RayonRuntime}, | ||
tokio_runtime::{TokioConfig, TokioRuntime}, | ||
}; | ||
pub type ConstString = Box<str>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we want boxed strings? it seems like this is the whole cause of us not being able to simply clone the maps, but doesn't seem necessary to me at first glance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make sure noone gets ideas about mutating them. But I guess they can be just String if that is somehow better. We would not be able to just clone the maps anyway, but I guess it would make code slightly cleaner. I'll check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Box's don't prevent mutation though. Just provide unique ownership
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. Should I change to String?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think unless we have a good reason to use box we shouldn't
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
020d88f
to
9ca4c0f
Compare
Co-authored-by: Andrew Fitzgerald <[email protected]>
661f1a1
to
892b4ed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left another few nits, but overall LGTM!
Would like others to weigh in on design at least before merging as this will obviously be foundational to how we manage threads, and once we start actually integrating w/ services, will become harder to change things
892b4ed
to
bf9196e
Compare
bf9196e
to
3b56edf
Compare
Problem
There exist a variety of perf issues related to unorganized thread pools that spawn far more threads than are useful on a given machine, this was confirmed by measurements (see measurement examples in thread-manager/examples directory) and is relatively easy to fix in agave.
Idea is to eventually address the needs of
Summary of Changes
Added new agave-thread-manager crate, which is to be gradually hooked into the agave itself. It will be used to centralize thread pool creation such that their core allocations can be controlled, and total thread count can gradually be reduced to match core count as closely as possible. Crate comes with its own benchmarks and some tests. The crate has full functionality only on Linux as this is the only platform where it is relevant.