Add a tracing warning when a thread blocks steps#162
Add a tracing warning when a thread blocks steps#162Benjscho wants to merge 4 commits intotokio-rs:mainfrom
Conversation
|
@LucioFranco would you be able to review? |
LucioFranco
left a comment
There was a problem hiding this comment.
Overall looks good I have just two small things we should fix/think about
| return Err(format!( | ||
| "Ran for duration: {:?} steps: {} without completing", | ||
| self.config.duration, self.steps, | ||
| self.config.duration, self.steps.load(Relaxed), |
There was a problem hiding this comment.
You can avoid this load because fetch_add above returns the previous value so you can just +1 that and get the current value.
There was a problem hiding this comment.
Nice, will adjust that!
| Ok(_) | Err(TryRecvError::Disconnected) => break, | ||
| _ => {} | ||
| } | ||
| std::thread::sleep(Duration::from_secs(10)); |
There was a problem hiding this comment.
I wonder if we want to add a limit or make this exponentially backoff so that the noise of it is reduced in say a ci scenario where it may timeout and swamp up the logs?
There was a problem hiding this comment.
That's a good point, we could log it only once per step with the step it's stuck on and then skip the log after that. I'd think something's likely broken if a single step is taking 10s of real time work
Add a warning to the sim when a given host or client blocks progress in a simulation run. This works by spawning a background thread for each run that periodically checks the steps taken by the simulation. If the number of steps is the same between checks then the thread adds the tracing info.
52bdca7 to
e890f25
Compare
mcches
left a comment
There was a problem hiding this comment.
Did you a consider a strategy where this is fallible? If X duration of real time elapses and the sim doesn't progress, fail the test. This saves operators from cancelling run away builds.
| loop { | ||
| let prev = steps.load(std::sync::atomic::Ordering::Relaxed); | ||
| // Exit if main thread has. | ||
| match rx.try_recv() { |
There was a problem hiding this comment.
How does this behave when you call run() N times in a row? It looks like you could spawn a ton of threads that don't clean up for 10s.
There was a problem hiding this comment.
That's a good point, I'll adjust it so it uses recv_timeout - that way if sim exits early the background thread will be closed too and it'll clean up straight away
| let is_finished = self.step()?; | ||
|
|
||
| if is_finished { | ||
| let _ = tx.send(()); |
There was a problem hiding this comment.
Doesn't the drop handle this for you?
I'll step back on this, and instead we can expect CI systems running these to set the timeout to be less oppinionated. For example, nextest is pretty great for configuring this. |
|
@mcches or @LucioFranco would you mind giving this another look when you have a min? I've addressed all prev comments |
|
Discussed offline as planning to not implement. Blocking warning would be nice, but most test runners (e.g., nextest) provide warnings for long running tests. Spawning a background thread per test for parallel scenarios seemed incorrect |
Add a warning to the sim when a given host or client blocks progress in a simulation run. This works by spawning a background thread for each run that periodically checks the steps taken by the simulation. If the number of steps is the same between checks then the thread adds the tracing info.
Few questions here:
cargo testwithout--nocaptureFixes #160