-
Notifications
You must be signed in to change notification settings - Fork 433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timer in MultiThreadedExecutor crashes #2455
Comments
Seems it crashed in rclcpp, since you are using |
Are there choices? I mean, I didn't know I can have other choices here... can you explain me this point, or direct me to resources to read? |
|
Sorry for my lack of knowledge on this, but... is there a procedure to run this without |
ROS doesn't depend on |
New backtrace, with #3 0x00007ffff7c8e3a1 in ?? () from /usr/lib/x86_64-linux-gnu/libjemalloc.so.2
#4 0x00007ffff77bc41a in __gnu_cxx::new_allocator<void*>::deallocate (__t=1, __p=<optimized out>, this=<optimized out>) at /usr/include/c++/11/ext/new_allocator.h:145
#5 std::allocator_traits<std::allocator<void*> >::deallocate (__n=1, __p=<optimized out>, __a=...) at /usr/include/c++/11/bits/alloc_traits.h:496
#6 rclcpp::allocator::retyped_reallocate<void*, std::allocator<void*> > (untyped_pointer=<optimized out>, size=8, untyped_allocator=<optimized out>)
at /home/user/ros2_iron/src/ros2/rclcpp/rclcpp/include/rclcpp/allocator/allocator_common.hpp:78
#7 0x00007ffff796c98f in rcl_wait_set_resize () from /home/user/ros2_iron/install/rcl/lib/librcl.so
#8 0x00007ffff77899c2 in rclcpp::Executor::wait_for_work (this=0x7ffff02cf890, timeout=...) at /home/user/ros2_iron/src/ros2/rclcpp/rclcpp/src/rclcpp/executor.cpp:794
#9 0x00007ffff778a073 in rclcpp::Executor::get_next_executable (this=0x7ffff02cf890, any_executable=..., timeout=std::chrono::duration = { -1ns })
at /home/user/ros2_iron/src/ros2/rclcpp/rclcpp/src/rclcpp/executor.cpp:965
#10 0x00007ffff779a372 in rclcpp::executors::MultiThreadedExecutor::run (this=0x7ffff02cf890, this_thread_number=<optimized out>)
at /home/user/ros2_iron/src/ros2/rclcpp/rclcpp/src/rclcpp/executors/multi_threaded_executor.cpp:92
#11 0x00007ffff72e6793 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#12 0x00007ffff6e94ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#13 0x00007ffff6f26660 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
Seems that in |
I found this possibly related issue: ros2/rmw_fastrtps#728 Seems related to another run I did: #4 0x00007ffff7c0e0e4 in ?? () from /usr/lib/x86_64-linux-gnu/libjemalloc.so.2
#5 0x00007ffff7c998fd in operator new(unsigned long) () from /usr/lib/x86_64-linux-gnu/libjemalloc.so.2
#6 0x00007ffff538aafc in __gnu_cxx::new_allocator<eprosima::fastdds::dds::Condition*>::allocate(unsigned long, void const*) ()
from /home/user/ros2_iron/install/rmw_fastrtps_shared_cpp/lib/librmw_fastrtps_shared_cpp.so
#7 0x00007ffff538a9cf in std::allocator_traits<std::allocator<eprosima::fastdds::dds::Condition*> >::allocate(std::allocator<eprosima::fastdds::dds::Condition*>&, unsigned long) ()
from /home/user/ros2_iron/install/rmw_fastrtps_shared_cpp/lib/librmw_fastrtps_shared_cpp.so
#8 0x00007ffff538a8ac in std::_Vector_base<eprosima::fastdds::dds::Condition*, std::allocator<eprosima::fastdds::dds::Condition*> >::_M_allocate(unsigned long) ()
from /home/user/ros2_iron/install/rmw_fastrtps_shared_cpp/lib/librmw_fastrtps_shared_cpp.so
#9 0x00007ffff538a535 in void std::vector<eprosima::fastdds::dds::Condition*, std::allocator<eprosima::fastdds::dds::Condition*> >::_M_realloc_insert<eprosima::fastdds::dds::Condition*>(__gnu_cxx::__normal_iterator<eprosima::fastdds::dds::Condition**, std::vector<eprosima::fastdds::dds::Condition*, std::allocator<eprosima::fastdds::dds::Condition*> > >, eprosima::fastdds::dds::Condition*&&) ()
from /home/user/ros2_iron/install/rmw_fastrtps_shared_cpp/lib/librmw_fastrtps_shared_cpp.so
#10 0x00007ffff538a3a0 in eprosima::fastdds::dds::Condition*& std::vector<eprosima::fastdds::dds::Condition*, std::allocator<eprosima::fastdds::dds::Condition*> >::emplace_back<eprosima::fastdds::dds::Condition*>(eprosima::fastdds::dds::Condition*&&) () from /home/user/ros2_iron/install/rmw_fastrtps_shared_cpp/lib/librmw_fastrtps_shared_cpp.so
#11 0x00007ffff538a102 in std::vector<eprosima::fastdds::dds::Condition*, std::allocator<eprosima::fastdds::dds::Condition*> >::push_back(eprosima::fastdds::dds::Condition*&&) ()
from /home/user/ros2_iron/install/rmw_fastrtps_shared_cpp/lib/librmw_fastrtps_shared_cpp.so
#12 0x00007ffff53877d3 in rmw_fastrtps_shared_cpp::__rmw_wait(char const*, rmw_subscriptions_s*, rmw_guard_conditions_s*, rmw_services_s*, rmw_clients_s*, rmw_events_s*, rmw_wait_set_s*, rmw_time_s const*) () from /home/user/ros2_iron/install/rmw_fastrtps_shared_cp |
Hi, I believe the crash is related to memory deallocation. To address this, you can use AddressSanitizer by following these steps: Rebuild your application with the following command: colcon build --cmake-args -DCMAKE_CXX_FLAGS="-fsanitize=address -g" -DCMAKE_C_FLAGS="-fsanitize=address -g" --packages-select <name-of-your package> Run your application and examine the error messages produced by the sanitizer for further insights. |
I've tried your code in I modified your launch.py # prefix="tmux split-window gdb -ex run --args", # run then with "tmux new-session ros2 launch ..." and use ros2 launch node.launch.py about 10 minutes [INFO] [launch]: All log files can be found below /root/.ros/log/2024-03-20-17-46-17-374247-9988e08b3540-27165
[INFO] [launch]: Default logging verbosity is set to INFO
[INFO] [component_container_mt-1]: process started with pid [27189]
[component_container_mt-1] [INFO] [1710956777.786610833] [localisation_master]: Load Library: /share/debug_wks/install/timer_test/lib/libTTest.so
[component_container_mt-1] [INFO] [1710956777.790359426] [localisation_master]: Found class: rclcpp_components::NodeFactoryTemplate<test::timer>
[component_container_mt-1] [INFO] [1710956777.790432846] [localisation_master]: Instantiate class: rclcpp_components::NodeFactoryTemplate<test::timer>
[INFO] [launch_ros.actions.load_composable_nodes]: Loaded node '/timer_test' in container '/localisation_master'
^C^C[WARNING] [launch]: user interrupted with ctrl-c (SIGINT)
[component_container_mt-1] [INFO] [1710957541.966848000] [rclcpp]: signal_handler(signum=2)
[INFO] [component_container_mt-1]: process has finished cleanly [pid 27189] |
@Zard-C thank you for checking. I will also try with every machine I have available... Tried also with two different RMWs, both induce a crash. And I will try also what you suggest with the sanitizer. |
I'm going to see if I can reproduce this locally with and without #2142 |
For what it is worth, I also tried to reproduce this locally, and I can't do it. What I did was to download the
And I was able to run it for at least a couple of minutes without a crash. I even tried putting some additional stress on my machine (which can cause race conditions), and I didn't see it either. @roncapat Is there something else I need to do to be able to reproduce this? |
@clalancette not really, for me it was enough to trigger the issue. Nevertheless I may get some spare time to try again in the upcoming days, testing also latest rolling. I'm seeing a lot of activities recently on rclcpp executors and related stuff, so it may also be possible that this got somehow solved. Unfortunately, in the latter case, I may not have enough time to do a full search in the last N commits of the repo to identify the specific commit eventually responsible of the fix. At least for now, thank you all for running the test and checking this issue :) |
@clalancette one last thing... Could you try using the launch file instead? I'm on mobile now, but IIRC the executable by default may not make use of the MultiThreadedExecutor, while by composition via launch file I explicitly set the type of container (thus, the type of executor). |
I tried that as well, at least on Rolling. And I couldn't reproduce there. |
I attempted to reproduce this with the I did not apply stress, but it doesn't seem that did much. |
Platform: Ubuntu 22.04 - Iron Irwini compiled from source.
I've just
vcs pull
ed, rebuilt and re-tested everything.I built a very simple
rclcpp
component with a timer spinning at 200 Hz, publishing an emptyImu
message. When loading the component in acomponent_container_mt
, in a few seconds this is what happens (see below backtrace).Seems that something wrong happens during allocation/deallocation of some resources.
I attach the .zip file with demo package for debug purposes.
timer_test.zip
The text was updated successfully, but these errors were encountered: