Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixup Executor::spin_all() regression fix #2517

Merged
merged 9 commits into from
May 2, 2024

Conversation

wjwwood
Copy link
Member

@wjwwood wjwwood commented Apr 30, 2024

This is a follow up of #2509 which addresses #2508.

Janosch Machowinski and others added 8 commits April 30, 2024 03:27
Checks if executors are busy waiting while they should block
in spin_some or spin_all.

Signed-off-by: Janosch Machowinski <[email protected]>
This test was strange. It looked like, it assumed that spin_all did
not return instantly. Also it was racy, as the thread could terminate
instantly.

Signed-off-by: Janosch Machowinski <[email protected]>
Co-authored-by: Tomoya Fujita <[email protected]>
Signed-off-by: jmachowinski <[email protected]>
Before, the method would not recollect available work in case of
spin_some, spin_all. This would lead to the method behaving differently
than to what the documentation states.

Signed-off-by: Janosch Machowinski <[email protected]>
@wjwwood wjwwood self-assigned this Apr 30, 2024
@wjwwood wjwwood mentioned this pull request Apr 30, 2024
@wjwwood
Copy link
Member Author

wjwwood commented Apr 30, 2024

CI:

  • Linux Build Status (updated after job got canceled)
  • Linux-aarch64 Build Status (known issues with mimic)
  • Windows Build Status (restarted after the job hung)

Copy link
Member Author

@wjwwood wjwwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some notes for anyone reviewing this.

Comment on lines +369 to +374
// clear the wait result and wait for work without blocking to collect the work
// for the first time
// both spin_some and spin_all wait for work at the beginning
wait_result_.reset();
wait_for_work(std::chrono::milliseconds(0));
bool just_waited = true;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change wasn't part of the original report (busy wait in spin_all) but it is related and is needed. It essentially ensures that we don't start out spin_some or spin_all by iterating over existing work from a previous spin method. Ideally the other spin methods would reset the result before returning so that this wouldn't be necessary, but this is safer. We do want to do this because in the case of spin_some this could cause it to never wait, iterating over old work and exiting after. This isn't really what we want.

Also, it is safe to throw away a wait result and wait again as all the things we wait on need to be able to be waited on, be ready, not be handled, be waited on again, and then be ready again. Or else they need to be ok with not being handled. Put another way, being waited on and then being ready does not guarantee being executed, and it may require being waited on again.

rclcpp/src/rclcpp/executor.cpp Outdated Show resolved Hide resolved
Copy link
Member

@mjcarroll mjcarroll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall change and test looks good. My only nit is that we need to probably put some of your comments here in the file for posterity.

@wjwwood
Copy link
Member Author

wjwwood commented May 2, 2024

I added a comment in 8587653, so if the Rpr job comes back green I'll merge without new CI, since it was just a comment and Rpr checks style and linting.

@wjwwood
Copy link
Member Author

wjwwood commented May 2, 2024

I will then follow up with a jazzy backport.

@wjwwood wjwwood merged commit f7185dc into rolling May 2, 2024
3 checks passed
@wjwwood wjwwood deleted the wjwwood/executor_spin_all_regression_fix branch May 2, 2024 21:06
@wjwwood
Copy link
Member Author

wjwwood commented May 2, 2024

@Mergifyio backport jazzy

Copy link
Contributor

mergify bot commented May 2, 2024

backport jazzy

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request May 2, 2024
* test(Executors): Added tests for busy waiting

Checks if executors are busy waiting while they should block
in spin_some or spin_all.

Signed-off-by: Janosch Machowinski <[email protected]>

* fix: Reworked spinAll test

This test was strange. It looked like, it assumed that spin_all did
not return instantly. Also it was racy, as the thread could terminate
instantly.

Signed-off-by: Janosch Machowinski <[email protected]>

* fix(Executor): Fixed spin_all not returning instantly is no work was available

Signed-off-by: Janosch Machowinski <[email protected]>

* Update rclcpp/test/rclcpp/executors/test_executors.cpp

Co-authored-by: Tomoya Fujita <[email protected]>
Signed-off-by: jmachowinski <[email protected]>

* test(executors): Added test for busy waiting while calling spin

Signed-off-by: Janosch Machowinski <[email protected]>

* fix(executor): Reset wait_result on every call to spin_some_impl

Before, the method would not recollect available work in case of
spin_some, spin_all. This would lead to the method behaving differently
than to what the documentation states.

Signed-off-by: Janosch Machowinski <[email protected]>

* restore previous test logic for now

Signed-off-by: William Woodall <[email protected]>

* refactor spin_some_impl's logic and improve busy wait tests

Signed-off-by: William Woodall <[email protected]>

* added some more comments about the implementation

Signed-off-by: William Woodall <[email protected]>

---------

Signed-off-by: Janosch Machowinski <[email protected]>
Signed-off-by: jmachowinski <[email protected]>
Signed-off-by: William Woodall <[email protected]>
Co-authored-by: Janosch Machowinski <[email protected]>
Co-authored-by: jmachowinski <[email protected]>
Co-authored-by: Tomoya Fujita <[email protected]>
(cherry picked from commit f7185dc)
wjwwood added a commit that referenced this pull request May 3, 2024
* test(Executors): Added tests for busy waiting

Checks if executors are busy waiting while they should block
in spin_some or spin_all.

Signed-off-by: Janosch Machowinski <[email protected]>

* fix: Reworked spinAll test

This test was strange. It looked like, it assumed that spin_all did
not return instantly. Also it was racy, as the thread could terminate
instantly.

Signed-off-by: Janosch Machowinski <[email protected]>

* fix(Executor): Fixed spin_all not returning instantly is no work was available

Signed-off-by: Janosch Machowinski <[email protected]>

* Update rclcpp/test/rclcpp/executors/test_executors.cpp

Co-authored-by: Tomoya Fujita <[email protected]>
Signed-off-by: jmachowinski <[email protected]>

* test(executors): Added test for busy waiting while calling spin

Signed-off-by: Janosch Machowinski <[email protected]>

* fix(executor): Reset wait_result on every call to spin_some_impl

Before, the method would not recollect available work in case of
spin_some, spin_all. This would lead to the method behaving differently
than to what the documentation states.

Signed-off-by: Janosch Machowinski <[email protected]>

* restore previous test logic for now

Signed-off-by: William Woodall <[email protected]>

* refactor spin_some_impl's logic and improve busy wait tests

Signed-off-by: William Woodall <[email protected]>

* added some more comments about the implementation

Signed-off-by: William Woodall <[email protected]>

---------

Signed-off-by: Janosch Machowinski <[email protected]>
Signed-off-by: jmachowinski <[email protected]>
Signed-off-by: William Woodall <[email protected]>
Co-authored-by: Janosch Machowinski <[email protected]>
Co-authored-by: jmachowinski <[email protected]>
Co-authored-by: Tomoya Fujita <[email protected]>
(cherry picked from commit f7185dc)

Co-authored-by: William Woodall <[email protected]>
}
EXPECT_GT(this->waitable->get_count(), 0u);

first_check_passed = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to lock cv_m while accessing the share state (e.g. first_check_passed). If you don't do this, you get race conditions

cv.wait_for(lk, 10s);
}
EXPECT_GT(this->waitable->get_count(), 0u);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is racy / does not really check for busy waits. You need to wait a bit here, to give a busy waiting executor time to call is_ready a lot of times. As the test is implemented right now, it could pass even if the executor is busy waiting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants