-
Notifications
You must be signed in to change notification settings - Fork 434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release ownership of entities after spinning cancelled #2556
Release ownership of entities after spinning cancelled #2556
Conversation
can we deprecate / remove the static executor rather than keep trying to fix it? |
Might be a misunderstanding, I thought they meant the StaticSingleThreadedExecutor too, but they mean they had a So this affects all of the executors that use this part of the executor base class. |
But yes, we can deprecate it in rolling at least, though that should be a separate pr. |
Your understanding are right. Sorry, my description |
i thought that too until looking into the reproducible program.
true. actually they uses |
@Barry-Xu-2018 that was kind of what I was thinking, but we should take care to clear it before setting the |
rclcpp/src/rclcpp/executor.cpp
Outdated
@@ -275,7 +275,7 @@ Executor::spin_until_future_complete_impl( | |||
if (spinning.exchange(true)) { | |||
throw std::runtime_error("spin_until_future_complete() called while already spinning"); | |||
} | |||
RCPPUTILS_SCOPE_EXIT(this->spinning.store(false); ); | |||
RCPPUTILS_SCOPE_EXIT(this->spinning.store(false);wait_result_.reset();); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this order is not thread-safe.
you are first setting spinning
to false and then resetting the wait_result.
If there are two threads, one trying to start spin and the other exiting spin, there can be a concurrent access to wait_result_
.
- thread 1 is exiting spin.
- thread 1 sets spinning to false
- thread 2 is entering spin
- thread 2 does something on the wait result
- thread 1 finishes to exit and resets the wait result
We should at least invert the order, to ensure that no other threads can enter spin until we have cleared the wait result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. It is better to invert the order.
Besides, There will be many issues if multiple threads executes the same executor's spin concurrently.
wait_result_
was completely unprotected.
rclcpp/rclcpp/src/rclcpp/executor.cpp
Lines 724 to 751 in f4923c6
void | |
Executor::wait_for_work(std::chrono::nanoseconds timeout) | |
{ | |
TRACETOOLS_TRACEPOINT(rclcpp_executor_wait_for_work, timeout.count()); | |
// Clear any previous wait result | |
this->wait_result_.reset(); | |
{ | |
std::lock_guard<std::mutex> guard(mutex_); | |
if (this->entities_need_rebuild_.exchange(false) || current_collection_.empty()) { | |
this->collect_entities(); | |
} | |
} | |
this->wait_result_.emplace(wait_set_.wait(timeout)); | |
if (!this->wait_result_ || this->wait_result_->kind() == WaitResultKind::Empty) { | |
RCUTILS_LOG_WARN_NAMED( | |
"rclcpp", | |
"empty wait set received in wait(). This should never happen."); | |
} else { | |
if (this->wait_result_->kind() == WaitResultKind::Ready && current_notify_waitable_) { | |
auto & rcl_wait_set = this->wait_result_->get_wait_set().get_rcl_wait_set(); | |
if (current_notify_waitable_->is_ready(rcl_wait_set)) { | |
current_notify_waitable_->execute(current_notify_waitable_->take_data()); | |
} | |
} | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least in the case of the multithreaded executor, the wait_result is protected by a mutex in the mt executor itself (
std::lock_guard wait_lock{wait_mutex_}; |
As for single-threaded executor, it should generally not be possible to call spin from multiple threads as there is an internal spinning
atomic that should block other threads from spinning an executor that is already spinning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mjcarroll Thanks for your comments.
I didn't express myself clearly. I mean, For the SingleThreadedExecutor, if the user creates multiple threads to execute spin(), problems will arise.
The failure of Rpr__rclcpp__ubuntu_noble_amd64 is related to test |
Signed-off-by: Barry Xu <[email protected]>
Signed-off-by: Barry Xu <[email protected]>
Signed-off-by: Barry Xu <[email protected]>
b30df3a
to
50bce78
Compare
Rebase was done. |
Signed-off-by: Barry Xu <[email protected]>
@alsora I updated code. Please help to review again. |
Signed-off-by: Barry Xu <[email protected]>
@alsora I have moved test code. Please review again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@Mergifyio backport jazzy |
✅ Backports have been created
|
* Release ownership of entities after spinning cancelled Signed-off-by: Barry Xu <[email protected]> * Move release action to every exit point in different spin functions Signed-off-by: Barry Xu <[email protected]> * Move wait_result_.reset() before setting spinning to false Signed-off-by: Barry Xu <[email protected]> * Update test code Signed-off-by: Barry Xu <[email protected]> * Move test code to test_executors.cpp Signed-off-by: Barry Xu <[email protected]> --------- Signed-off-by: Barry Xu <[email protected]> (cherry picked from commit 069a001) # Conflicts: # rclcpp/test/rclcpp/executors/test_executors.cpp
… (#2580) * Release ownership of entities after spinning cancelled (#2556) * Release ownership of entities after spinning cancelled Signed-off-by: Barry Xu <[email protected]> * Move release action to every exit point in different spin functions Signed-off-by: Barry Xu <[email protected]> * Move wait_result_.reset() before setting spinning to false Signed-off-by: Barry Xu <[email protected]> * Update test code Signed-off-by: Barry Xu <[email protected]> * Move test code to test_executors.cpp Signed-off-by: Barry Xu <[email protected]> --------- Signed-off-by: Barry Xu <[email protected]> (cherry picked from commit 069a001) # Conflicts: # rclcpp/test/rclcpp/executors/test_executors.cpp * Fix backport issue (#2581) Signed-off-by: Barry Xu <[email protected]> --------- Signed-off-by: Barry Xu <[email protected]> Co-authored-by: Barry Xu <[email protected]>
This issue was found in ros2/rmw_fastrtps#761.
In code of MoveIt, it used static executor. When the static executor is destructed (The executor is in a cancel state.), and it's time to free up the ownership of entity resources, Fast DDS resources have already been released.
After spinning is cancelled, the Executor should release ownership of entities at once.