-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test case and fix for for https://github.com/ros2/rclcpp/issues/2652 #2713
base: rolling
Are you sure you want to change the base?
Conversation
6906974
to
660ffaa
Compare
@luca-della-vedova @fmrico can you test this fix ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried on the reproduction repo and this branch indeed fixes it!
660ffaa
to
e3142d0
Compare
Sure!! testing... |
I have tested this PR against:
Everything is working now: LGTM 🔥 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jmachowinski lgtm with green CI.
i would like to confirm one thing. how come do you consider this breaks ABI by the way? Templated functions are instantiated at compile-time, and their changes do not affect the binary interface (ABI) of the library.
This specific case is likely not to break ABI, but leaves you in a bad place. Depending on what was compiled at which time, you end up with the fix or not. E.g. you have libraries, that use the Executor therefore they contain a symbol of the Executor. These libraries were compiled before the fix. Even though you now compile with the new fixed template, the linker will choose one symbol to use later on during linking, and may use the 'broken' version from the library. In general, you can break ABI with templates, but you need to have a libraries. To break ABI, you need a to have a templated type, that changed its memory layout due to an alteration of the template. If this type is used by library A, but this was compiled against the old version of the template, and you now try to use the library with the new template, you get an ABI break, and most likely a crash... |
As we discussed in PMC, this doesn't really break ABI, but updating rclcpp by itself is insufficient to get the fix. You must also recompile any downstream users of the template. It's enough to justify a notification, and we are planning on doing a discourse post about it. |
Pulls: #2713 |
78d859d
to
27b2d0e
Compare
Pulls: #2713 |
just in case i started the CI after rebasing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, but I had one question that might lead to an edit
if(this->needs_pruning_) { | ||
this->storage_prune_deleted_entities(); | ||
this->needs_pruning_ = false; | ||
} | ||
|
||
this->storage_release_ownerships(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this check and the pruning happen after releasing ownerships? It seems like holding the ownership by the storage might cause the weak pointers to have not expired even though all the user's shared pointers have been destroyed? Maybe it gets caught on the next run through?
I suppose this would only happen if something lost all ownership (except the wait set's) after needs_pruning_
became true but before this step.
Either way, since storage_prune_deleted_entities()
iterates through all of the wait set's weak pointers and checks them for expired, we might as well release ownerships first no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to happen while holding the ownership, in order to be sure that the indices still match. If you release ownership, and prune, and some references were dropped in between, the indices of the rcl waitset will not match any more the one in the storage. Basically leading to the same bug as before.
I don't like this implicit matching of indixes in general. I would like to introduce a backmapping datastructure later on, that uses the indices returned by the rcl_add functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wjwwood does this answer your question ?
Signed-off-by: Janosch Machowinski <[email protected]>
Signed-off-by: Janosch Machowinski <[email protected]>
Signed-off-by: Janosch Machowinski <[email protected]>
Signed-off-by: Francisco Martín Rico <[email protected]>
Signed-off-by: Janosch Machowinski <[email protected]>
27b2d0e
to
8f8aab8
Compare
This is a test case and fix for the bug #2652