Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot discover the endpoint after multiple nodes created and exited #699

Closed
fujitatomoya opened this issue Jul 24, 2023 · 19 comments
Closed

Comments

@fujitatomoya
Copy link
Collaborator

Bug report

Required Info:

  • Operating System:
    • ubuntu 22.04
  • Installation type:
    • binaries, source build
  • Version or commit hash:
    • Humble
  • DDS implementation:
    • Fast-DDS
  • Client library (if applicable):
    • rclcpp, rclpy

Steps to reproduce issue

The following can be reproducible only if it is in the localhost. (shared memory transport is enabled)

  1. start the subscription.
# ros2 run examples_rclpy_minimal_subscriber subscriber_member_function
  1. start the multiple publishers and kill all of them after 10 seconds.
# cat > pub_nodes.launch.py << EOS
from launch import LaunchDescription
from launch_ros.actions import Node
node_num_ = 100
def generate_launch_description():
    ld = LaunchDescription()
    for i in range(node_num_):
        ld.add_action(
            Node(
                package='examples_rclpy_minimal_publisher',
                executable='publisher_member_function',
                name='pub_node_{}'.format(i),
            )
        )
    return ld
EOS

# ros2 launch ./pub_nodes.launch.py
...<snip>

### Send Ctrl-C after a short while (about 10 seconds) to kill all publishers
  1. Start a single publisher in another terminal
# ros2 run examples_rclpy_minimal_publisher publisher_member_function
  1. Check the subscription terminal see if that can receive the message from the publisher with procedure-3.

Expected behavior

subscription can receive the message with procedure-4.

Actual behavior

subscription cannot receive the message with procedure-4.

Additional information

  • this problem cannot be observed with rmw_cyclonedds.
  • this problem only can be observed with localhost communication. (via network interfaces, there is no such problem.)\
  • when this issue occures, ros2 node list --no-daemon does not show subscriber running in first terminal.
  • when SHM for fastrtps disabled and UDP transport enabled, this issue does not reproduce.
 <?xml version="1.0" encoding="UTF-8" ?>
 <profiles xmlns="http://www.eprosima.com/XMLSchemas/fastRTPS_Profiles" >
     <transport_descriptors>
         <transport_descriptor>
             <transport_id>CustomUdpTransport</transport_id>
             <type>UDPv4</type>
         </transport_descriptor>
     </transport_descriptors>

     <participant profile_name="participant_profile" is_default_profile="true">
         <rtps>
             <userTransports>
                 <transport_id>CustomUdpTransport</transport_id>
             </userTransports>

             <useBuiltinTransports>false</useBuiltinTransports>
         </rtps>
     </participant>
 </profiles>
@fujitatomoya
Copy link
Collaborator Author

@MiguelCompany
Copy link
Collaborator

@fujitatomoya We have recently merged some changes to Fast DDS dealing with this exact problem. Could you verify if this is fixed with the current head of 2.6.x (i.e. eProsima/Fast-DDS@3fffea5) ?

@Barry-Xu-2018
Copy link
Contributor

@MiguelCompany

The version I am using is d0cc5a6d4cf124ed42b8df061cd857144e372888, which is two commits behind the latest version.
And these 2 commits aren't related to SHM codes.
At this version, this problem also can reproduce in my environment.

@MiguelCompany
Copy link
Collaborator

@Barry-Xu-2018 As the report stated Humble as the version, I understood it was related to Fast DDS 2.6.x. Your comment seems to suggest this is also reproduced in Rolling ?

@Barry-Xu-2018
Copy link
Contributor

@MiguelCompany Yes. I also speculated whether subsequent updates have fixed this issue, so I tried using the rolling version.

@Barry-Xu-2018
Copy link
Contributor

Barry-Xu-2018 commented Jul 24, 2023

@MiguelCompany
CC: @fujitatomoya

While issue occurs,

There is one thread in the subscriber_member_function that consumes almost 100% of CPU usage.

abnormal CPU usage

top - 15:16:41 up  4:33,  2 users,  load average: 14.46, 9.73, 8.67
Threads:  12 total,   1 running,  11 sleeping,   0 stopped,   0 zombie
%Cpu(s):  6.2 us,  0.8 sy,  0.0 ni, 92.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  31804.4 total,   5983.9 free,  18137.0 used,   7683.6 buff/cache
MiB Swap:  65536.0 total,  65475.2 free,     60.8 used.  13057.1 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND  
 581173 chenlh    20   0  698376 105084  68464 R  99.9   0.3   0:26.56 subscriber_memb    # consume 100% CPU usage
 581140 chenlh    20   0  698376 105084  68464 S   0.0   0.3   0:09.58 subscriber_memb
 581167 chenlh    20   0  698376 105084  68464 S   0.0   0.3   0:00.00 subscriber_memb
 581168 chenlh    20   0  698376 105084  68464 S   0.0   0.3   0:00.00 subscriber_memb
 581169 chenlh    20   0  698376 105084  68464 S   0.0   0.3   0:00.00 subscriber_memb
 581170 chenlh    20   0  698376 105084  68464 S   0.0   0.3   0:00.85 subscriber_memb
 581171 chenlh    20   0  698376 105084  68464 S   0.0   0.3   0:00.48 subscriber_memb
 581172 chenlh    20   0  698376 105084  68464 S   0.0   0.3   0:00.00 subscriber_memb
 581174 chenlh    20   0  698376 105084  68464 S   0.0   0.3   0:00.00 subscriber_memb
 581175 chenlh    20   0  698376 105084  68464 S   0.0   0.3   0:00.19 subscriber_memb
 581176 chenlh    20   0  698376 105084  68464 S   0.0   0.3   0:00.60 subscriber_memb
 581177 chenlh    20   0  698376 105084  68464 S   0.0   0.3   0:00.03 subscriber_memb    

Check the backtrace of the thread 581173

backtrace

(gdb) info thread
  Id   Target Id                                            Frame 
* 1    Thread 0x7f713f38e1c0 (LWP 581140) "subscriber_memb" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x55adaa858260) at ./nptl/futex-internal.c:57
  2    Thread 0x7f713cc6e640 (LWP 581167) "subscriber_memb" __futex_abstimed_wait_common64 (private=<optimized out>, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7f713dd7c1e0 <(anonymous namespace)::g_signal_handler_sem>) at ./nptl/futex-internal.c:57
  3    Thread 0x7f71387ff640 (LWP 581168) "subscriber_memb" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x7f71387fede0, op=137, expected=0, futex_word=0x55adaa1d7900) at ./nptl/futex-internal.c:57
  4    Thread 0x7f7137ffe640 (LWP 581169) "subscriber_memb" __futex_abstimed_wait_common64 (private=939514416, cancel=true, abstime=0x7f7137ffdc10, op=137, expected=0, futex_word=0x55adaa1da710) at ./nptl/futex-internal.c:57
  5    Thread 0x7f71377fd640 (LWP 581170) "subscriber_memb" __futex_abstimed_wait_common64 (private=72506686, cancel=true, abstime=0x7f71377fcc10, op=137, expected=0, futex_word=0x55adaa1dd45c) at ./nptl/futex-internal.c:57
  6    Thread 0x7f7136ffc640 (LWP 581171) "subscriber_memb" 0x00007f713f127934 in __libc_recvfrom (fd=15, buf=0x55adaa1eb720, len=65500, flags=0, addr=..., addrlen=0x7f7136ffb69c) at ../sysdeps/unix/sysv/linux/recvfrom.c:27
  7    Thread 0x7f71367fb640 (LWP 581172) "subscriber_memb" 0x00007f713f127934 in __libc_recvfrom (fd=16, buf=0x55adaa1fbed0, len=65500, flags=0, addr=..., addrlen=0x7f71367fa69c) at ../sysdeps/unix/sysv/linux/recvfrom.c:27
  8    Thread 0x7f7135ffa640 (LWP 581173) "subscriber_memb" 0x00007f713d3681e0 in _Unwind_Find_FDE () from /lib/x86_64-linux-gnu/libgcc_s.so.1
  9    Thread 0x7f71357f9640 (LWP 581174) "subscriber_memb" 0x00007f713f127934 in __libc_recvfrom (fd=18, buf=0x55adaa20e8a0, len=65500, flags=0, addr=..., addrlen=0x7f71357f869c) at ../sysdeps/unix/sysv/linux/recvfrom.c:27
  10   Thread 0x7f7134ff8640 (LWP 581175) "subscriber_memb" __futex_abstimed_wait_common64 (private=<optimized out>, cancel=true, abstime=0x7f7134ff74a0, op=265, expected=0, futex_word=0x7f713dda0110) at ./nptl/futex-internal.c:57
  11   Thread 0x7f71347f7640 (LWP 581176) "subscriber_memb" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x55adaa227418) at ./nptl/futex-internal.c:57
  12   Thread 0x7f7133ff6640 (LWP 581177) "subscriber_memb" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7f7124000c04) at ./nptl/futex-internal.c:57
(gdb) thread 8
[Switching to thread 8 (Thread 0x7f7135ffa640 (LWP 581173))]
#0  0x00007f713d3681e0 in _Unwind_Find_FDE () from /lib/x86_64-linux-gnu/libgcc_s.so.1
(gdb) bt
#0  0x00007f713d3681e0 in _Unwind_Find_FDE () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#1  0x00007f713d364833 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#2  0x00007f713d3661e4 in _Unwind_RaiseException () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#3  0x00007f713d0ae50b in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007f713b3e18f7 in boost::interprocess::shared_memory_object::priv_open_or_create (this=0x7f7135ff9150, type=boost::interprocess::ipcdetail::DoOpen, filename=0x7f712021fee0 "fastrtps_fed84726591b50b4", mode=boost::interprocess::read_write, perm=...) at /home/chenlh/Projects/ROS2/ros2-master/src/eProsima/Fast-DDS/thirdparty/boost/include/boost/interprocess/shared_memory_object.hpp:363
#5  0x00007f713b3e14c8 in boost::interprocess::shared_memory_object::shared_memory_object (this=0x7f7135ff9150, name=0x7f712021fee0 "fastrtps_fed84726591b50b4", mode=boost::interprocess::read_write) at /home/chenlh/Projects/ROS2/ros2-master/src/eProsima/Fast-DDS/thirdparty/boost/include/boost/interprocess/shared_memory_object.hpp:83
#6  0x00007f713b3ee7c6 in boost::interprocess::ipcdetail::managed_open_or_create_impl<boost::interprocess::shared_memory_object, 16ul, true, false>::priv_open_or_create<boost::interprocess::ipcdetail::create_open_func<boost::interprocess::ipcdetail::basic_managed_memory_impl<char, boost::interprocess::rbtree_best_fit<boost::interprocess::mutex_family, boost::interprocess::offset_ptr<void, unsigned int, unsigned long, 0ul>, 0ul>, boost::interprocess::iset_index, 16ul> > > (this=0x7f712021fcc8, type=boost::interprocess::ipcdetail::DoOpen, id=@0x7f7135ff9230: 0x7f712021fee0 "fastrtps_fed84726591b50b4", size=0, mode=boost::interprocess::read_write, addr=0x0, perm=..., construct_func=...) at /home/chenlh/Projects/ROS2/ros2-master/src/eProsima/Fast-DDS/thirdparty/boost/include/boost/interprocess/detail/managed_open_or_create_impl.hpp:339
#7  0x00007f713b3ed408 in boost::interprocess::ipcdetail::managed_open_or_create_impl<boost::interprocess::shared_memory_object, 16ul, true, false>::managed_open_or_create_impl<boost::interprocess::ipcdetail::create_open_func<boost::interprocess::ipcdetail::basic_managed_memory_impl<char, boost::interprocess::rbtree_best_fit<boost::interprocess::mutex_family, boost::interprocess::offset_ptr<void, unsigned int, unsigned long, 0ul>, 0ul>, boost::interprocess::iset_index, 16ul> > > (this=0x7f712021fcc8, id=@0x7f7135ff9230: 0x7f712021fee0 "fastrtps_fed84726591b50b4", mode=boost::interprocess::read_write, addr=0x0, construct_func=...) at /home/chenlh/Projects/ROS2/ros2-master/src/eProsima/Fast-DDS/thirdparty/boost/include/boost/interprocess/detail/managed_open_or_create_impl.hpp:202
#8  0x00007f713b440c9d in boost::interprocess::basic_managed_shared_memory<char, boost::interprocess::rbtree_best_fit<boost::interprocess::mutex_family, boost::interprocess::offset_ptr<void, unsigned int, unsigned long, 0ul>, 0ul>, boost::interprocess::iset_index>::basic_managed_shared_memory (this=0x7f712021fcc0, name=0x7f712021fee0 "fastrtps_fed84726591b50b4", addr=0x0) at /home/chenlh/Projects/ROS2/ros2-master/src/eProsima/Fast-DDS/thirdparty/boost/include/boost/interprocess/managed_shared_memory.hpp:151
#9  0x00007f713b4408a2 in eprosima::fastdds::rtps::SharedSegment<boost::interprocess::basic_managed_shared_memory<char, boost::interprocess::rbtree_best_fit<boost::interprocess::mutex_family, boost::interprocess::offset_ptr<void, unsigned int, unsigned long, 0ul>, 0ul>, boost::interprocess::iset_index>, boost::interprocess::shared_memory_object>::SharedSegment (this=0x7f712021fd00, name="fastrtps_fed84726591b50b4") at /home/chenlh/Projects/ROS2/ros2-master/src/eProsima/Fast-DDS/src/cpp/utils/shared_memory/SharedMemSegment.hpp:341
#10 0x00007f713b9d99b0 in __gnu_cxx::new_allocator<eprosima::fastdds::rtps::SharedSegment<boost::interprocess::basic_managed_shared_memory<char, boost::interprocess::rbtree_best_fit<boost::interprocess::mutex_family, boost::interprocess::offset_ptr<void, unsigned int, unsigned long, 0ul>, 0ul>, boost::interprocess::iset_index>, boost::interprocess::shared_memory_object> >::construct<eprosima::fastdds::rtps::SharedSegment<boost::interprocess::basic_managed_shared_memory<char, boost::interprocess::rbtree_best_fit<boost::interprocess::mutex_family, boost::interprocess::offset_ptr<void, unsigned int, unsigned long, 0ul>, 0ul>, boost::interprocess::iset_index>, boost::interprocess::shared_memory_object>, boost::interprocess::open_only_t const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&> (this=0x7f7135ff93ff, __p=0x7f712021fd00) at /usr/include/c++/11/ext/new_allocator.h:162
#11 0x00007f713b9d7de4 in std::allocator_traits<std::allocator<eprosima::fastdds::rtps::SharedSegment<boost::interprocess::basic_managed_shared_memory<char, boost::interprocess::rbtree_best_fit<boost::interprocess::mutex_family, boost::interprocess::offset_ptr<void, unsigned int, unsigned long, 0ul>, 0ul>, boost::interprocess::iset_index>, boost::interprocess::shared_memory_object> > >::construct<eprosima::fastdds::rtps::SharedSegment<boost::interprocess::basic_managed_shared_memory<char, boost::interprocess::rbtree_best_fit<boost::interprocess::mutex_family, boost::interprocess::offset_ptr<void, unsigned int, unsigned long, 0ul>, 0ul>, boost::interprocess::iset_index>, boost::interprocess::shared_memory_object>, boost::interprocess::open_only_t const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&> (__a=..., __p=0x7f712021fd00) at /usr/include/c++/11/bits/alloc_traits.h:516
#12 0x00007f713b9d5aeb in std::_Sp_counted_ptr_inplace<eprosima::fastdds::rtps::SharedSegment<boost::interprocess::basic_managed_shared_memory<char, boost::interprocess::rbtree_best_fit<boost::interprocess::mutex_family, boost::interprocess::offset_ptr<void, unsigned int, unsigned long, 0ul>, 0ul>, boost::interprocess::iset_index>, boost::interprocess::shared_memory_object>, std::allocator<eprosima::fastdds::rtps::SharedSegment<boost::interprocess::basic_managed_shared_memory<char, boost::interprocess::rbtree_best_fit<boost::interprocess::mutex_family, boost::interprocess::offset_ptr<void, unsigned int, unsigned long, 0ul>, 0ul>, boost::interprocess::iset_index>, boost::interprocess::shared_memory_object> >, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<boost::interprocess::open_only_t const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&> (this=0x7f712021fcf0, __a=...) at /usr/include/c++/11/bits/shared_ptr_base.h:519
#13 0x00007f713b9d2e43 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<eprosima::fastdds::rtps::SharedSegment<boost::interprocess::basic_managed_shared_memory<char, boost::interprocess::rbtree_best_fit<boost::interprocess::mutex_family, boost::interprocess::offset_ptr<void, unsigned int, unsigned long, 0ul>, 0ul>, boost::interprocess::iset_index>, boost::interprocess::shared_memory_object>, std::allocator<eprosima::fastdds::rtps::SharedSegment<boost::interprocess::basic_managed_shared_memory<char, boost::interprocess::rbtree_best_fit<boost::interprocess::mutex_family, boost::interprocess::offset_ptr<void, unsigned int, unsigned long, 0ul>, 0ul>, boost::interprocess::iset_index>, boost::interprocess::shared_memory_object> >, boost::interprocess::open_only_t const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&> (this=0x7f7135ff95c8, __p=@0x7f7135ff95c0: 0x0, __a=...) at /usr/include/c++/11/bits/shared_ptr_base.h:650
#14 0x00007f713b9cfefc in std::__shared_ptr<eprosima::fastdds::rtps::SharedSegment<boost::interprocess::basic_managed_shared_memory<char, boost::interprocess::rbtree_best_fit<boost::interprocess::mutex_family, boost::interprocess::offset_ptr<void, unsigned int, unsigned long, 0ul>, 0ul>, boost::interprocess::iset_index>, boost::interprocess::shared_memory_object>, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<eprosima::fastdds::rtps::SharedSegment<boost::interprocess::basic_managed_shared_memory<char, boost::interprocess::rbtree_best_fit<boost::interprocess::mutex_family, boost::interprocess::offset_ptr<void, unsigned int, unsigned long, 0ul>, 0ul>, boost::interprocess::iset_index>, boost::interprocess::shared_memory_object> >, boost::interprocess::open_only_t const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&> (this=0x7f7135ff95c0, __tag=...) at /usr/include/c++/11/bits/shared_ptr_base.h:1342
#15 0x00007f713b9cbb8f in std::shared_ptr<eprosima::fastdds::rtps::SharedSegment<boost::interprocess::basic_managed_shared_memory<char, boost::interprocess::rbtree_best_fit<boost::interprocess::mutex_family, boost::interprocess::offset_ptr<void, unsigned int, unsigned long, 0ul>, 0ul>, boost::interprocess::iset_index>, boost::interprocess::shared_memory_object> >::shared_ptr<std::allocator<eprosima::fastdds::rtps::SharedSegment<boost::interprocess::basic_managed_shared_memory<char, boost::interprocess::rbtree_best_fit<boost::interprocess::mutex_family, boost::interprocess::offset_ptr<void, unsigned int, unsigned long, 0ul>, 0ul>, boost::interprocess::iset_index>, boost::interprocess::shared_memory_object> >, boost::interprocess::open_only_t const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&> (this=0x7f7135ff95c0, __tag=...) at /usr/include/c++/11/bits/shared_ptr.h:409
#16 0x00007f713b9c8923 in std::allocate_shared<eprosima::fastdds::rtps::SharedSegment<boost::interprocess::basic_managed_shared_memory<char, boost::interprocess::rbtree_best_fit<boost::interprocess::mutex_family, boost::interprocess::offset_ptr<void, unsigned int, unsigned long, 0ul>, 0ul>, boost::interprocess::iset_index>, boost::interprocess::shared_memory_object>, std::allocator<eprosima::fastdds::rtps::SharedSegment<boost::interprocess::basic_managed_shared_memory<char, boost::interprocess::rbtree_best_fit<boost::interprocess::mutex_family, boost::interprocess::offset_ptr<void, unsigned int, unsigned long, 0ul>, 0ul>, boost::interprocess::iset_index>, boost::interprocess::shared_memory_object> >, boost::interprocess::open_only_t const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&> (__a=...) at /usr/include/c++/11/bits/shared_ptr.h:863
#17 0x00007f713b9c4cf5 in std::make_shared<eprosima::fastdds::rtps::SharedSegment<boost::interprocess::basic_managed_shared_memory<char, boost::interprocess::rbtree_best_fit<boost::interprocess::mutex_family, boost::interprocess::offset_ptr<void, unsigned int, unsigned long, 0ul>, 0ul>, boost::interprocess::iset_index>, boost::interprocess::shared_memory_object>, boost::interprocess::open_only_t const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&> () at /usr/include/c++/11/bits/shared_ptr.h:879
#18 0x00007f713b9bf9db in eprosima::fastdds::rtps::SharedMemManager::find_segment (this=0x55adaa065b10, id=...) at /home/chenlh/Projects/ROS2/ros2-master/src/eProsima/Fast-DDS/src/cpp/rtps/transport/shared_mem/SharedMemManager.hpp:1296
#19 0x00007f713b9bdcab in eprosima::fastdds::rtps::SharedMemManager::Listener::pop (this=0x55adaa1e6d50) at /home/chenlh/Projects/ROS2/ros2-master/src/eProsima/Fast-DDS/src/cpp/rtps/transport/shared_mem/SharedMemManager.hpp:723
#20 0x00007f713b9c14e5 in eprosima::fastdds::rtps::SharedMemChannelResource::Receive (this=0x55adaa0f2000, remote_locator=...) at /home/chenlh/Projects/ROS2/ros2-master/src/eProsima/Fast-DDS/src/cpp/rtps/transport/shared_mem/SharedMemChannelResource.hpp:182
#21 0x00007f713b9c103e in eprosima::fastdds::rtps::SharedMemChannelResource::perform_listen_operation (this=0x55adaa0f2000, input_locator=...) at /home/chenlh/Projects/ROS2/ros2-master/src/eProsima/Fast-DDS/src/cpp/rtps/transport/shared_mem/SharedMemChannelResource.hpp:133
#22 0x00007f713b9dd769 in std::__invoke_impl<void, void (eprosima::fastdds::rtps::SharedMemChannelResource::*)(eprosima::fastrtps::rtps::Locator_t), eprosima::fastdds::rtps::SharedMemChannelResource*, eprosima::fastrtps::rtps::Locator_t> (__f=@0x55adaa1eb188: (void (eprosima::fastdds::rtps::SharedMemChannelResource::*)(eprosima::fastdds::rtps::SharedMemChannelResource * const, eprosima::fastrtps::rtps::Locator_t)) 0x7f713b9c0fb4 <eprosima::fastdds::rtps::SharedMemChannelResource::perform_listen_operation(eprosima::fastrtps::rtps::Locator_t)>, __t=@0x55adaa1eb180: 0x55adaa0f2000) at /usr/include/c++/11/bits/invoke.h:74
#23 0x00007f713b9dd2d2 in std::__invoke<void (eprosima::fastdds::rtps::SharedMemChannelResource::*)(eprosima::fastrtps::rtps::Locator_t), eprosima::fastdds::rtps::SharedMemChannelResource*, eprosima::fastrtps::rtps::Locator_t> (__fn=@0x55adaa1eb188: (void (eprosima::fastdds::rtps::SharedMemChannelResource::*)(eprosima::fastdds::rtps::SharedMemChannelResource * const, eprosima::fastrtps::rtps::Locator_t)) 0x7f713b9c0fb4 <eprosima::fastdds::rtps::SharedMemChannelResource::perform_listen_operation(eprosima::fastrtps::rtps::Locator_t)>) at /usr/include/c++/11/bits/invoke.h:96
#24 0x00007f713b9dd0a3 in std::thread::_Invoker<std::tuple<void (eprosima::fastdds::rtps::SharedMemChannelResource::*)(eprosima::fastrtps::rtps::Locator_t), eprosima::fastdds::rtps::SharedMemChannelResource*, eprosima::fastrtps::rtps::Locator_t> >::_M_invoke<0ul, 1ul, 2ul> (this=0x55adaa1eb168) at /usr/include/c++/11/bits/std_thread.h:253
#25 0x00007f713b9dcb42 in std::thread::_Invoker<std::tuple<void (eprosima::fastdds::rtps::SharedMemChannelResource::*)(eprosima::fastrtps::rtps::Locator_t), eprosima::fastdds::rtps::SharedMemChannelResource*, eprosima::fastrtps::rtps::Locator_t> >::operator() (this=0x55adaa1eb168) at /usr/include/c++/11/bits/std_thread.h:260
#26 0x00007f713b9dc408 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (eprosima::fastdds::rtps::SharedMemChannelResource::*)(eprosima::fastrtps::rtps::Locator_t), eprosima::fastdds::rtps::SharedMemChannelResource*, eprosima::fastrtps::rtps::Locator_t> > >::_M_run (this=0x55adaa1eb160) at /usr/include/c++/11/bits/std_thread.h:211
#27 0x00007f713d0dc2b3 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#28 0x00007f713f094b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#29 0x00007f713f126a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

after debugging, we found that find_segement open a segment named fastrtps_fed84726591b50b4 all the time.

Thread 8 "subscriber_memb" hit Breakpoint 3, eprosima::fastdds::rtps::SharedMemManager::find_segment (this=0x55adaa065b10, id=...) at /home/chenlh/Projects/ROS2/ros2-master/src/eProsima/Fast-DDS/src/cpp/rtps/transport/shared_mem/SharedMemManager.hpp:1296
1296	            segment = std::make_shared<SharedMemSegment>(boost::interprocess::open_only, segment_name);
(gdb) p segment_name
$3 = "fastrtps_fed84726591b50b4"
(gdb) c
Continuing.

Thread 8 "subscriber_memb" hit Breakpoint 3, eprosima::fastdds::rtps::SharedMemManager::find_segment (this=0x55adaa065b10, id=...) at /home/chenlh/Projects/ROS2/ros2-master/src/eProsima/Fast-DDS/src/cpp/rtps/transport/shared_mem/SharedMemManager.hpp:1296
1296	            segment = std::make_shared<SharedMemSegment>(boost::interprocess::open_only, segment_name);
(gdb) p segment_name
$4 = "fastrtps_fed84726591b50b4"
(gdb) c
Continuing.

Thread 8 "subscriber_memb" hit Breakpoint 3, eprosima::fastdds::rtps::SharedMemManager::find_segment (this=0x55adaa065b10, id=...) at /home/chenlh/Projects/ROS2/ros2-master/src/eProsima/Fast-DDS/src/cpp/rtps/transport/shared_mem/SharedMemManager.hpp:1296
1296	            segment = std::make_shared<SharedMemSegment>(boost::interprocess::open_only, segment_name);
(gdb) p segment_name
$5 = "fastrtps_fed84726591b50b4"

but the file fastrtps_fed84726591b50b4 does not exist in /dev/shm.

$ ls /dev/shm/fastrtps_fed84726591b50b4
ls: cannot access '/dev/shm/fastrtps_fed84726591b50b4': No such file or directory

At this scenario,subscriber always try to open deleted shared memeory file .

BTW, we think this problem is relevant to another issue ros2/ros2cli#582 (comment).

@MiguelCompany
Copy link
Collaborator

@Barry-Xu-2018 Thank you very much for the investigation. I was getting almost to the same place on my debugging session.

I've been testing some changes that solve the 100% CPU usage, but they do not always solve the main issue (i.e. step 4 sometimes succeeds, and sometimes fails).

We did a workaround on eProsima/Fast-DDS@687104a which fixes this issue. We are considering whether to add that commit into the supported branches, but we first need to evaluate the impact of incorporating those changes.

@Barry-Xu-2018
Copy link
Contributor

@MiguelCompany Thank you for your quick response.

The workaround eProsima/Fast-DDS@687104a is to avoid using SHM for metatraffic if there is other transport (such as UDP).
Could you briefly introduce what information belongs to metatraffic? If this information is transmitted through UDP, would it affect the performance of SHM transmission?

@Barry-Xu-2018
Copy link
Contributor

I found the definition on metatraffic in DDS spec.
image

So this information doesn't directly affect the performance of SHM transmission.

@MiguelCompany
Copy link
Collaborator

@Barry-Xu-2018 Correct. Metatraffic is basically discovery traffic.

@MiguelCompany
Copy link
Collaborator

@Barry-Xu-2018 I made some bugfixes on eProsima/Fast-DDS#3759, and made the meta-traffic be transmitted on UDP by default on eProsima/Fast-DDS#3753.

With those two on a ROS 2 rolling workspace, I could not reproduce this issue anymore.

@Barry-Xu-2018
Copy link
Contributor

Barry-Xu-2018 commented Jul 28, 2023

@MiguelCompany

Thank you for the further correction on this issue.
In my environment, I also cannot reproduce this issue with these 2 patches.

BTW, those 2 patches will backport to the version of FastDDS used by Humbel. Right ?

@fujitatomoya
Copy link
Collaborator Author

In my environment, I also cannot reproduce this issue with these 2 patches.

neither can I. thanks for the patches.

BTW, those 2 patches will backport to the version of FastDDS used by Humbel. Right?

eProsima/Fast-DDS#3753 includes ABI break change, but I am not sure Fast-DDS policy to backport. @MiguelCompany friendly ping?

@MiguelCompany
Copy link
Collaborator

@fujitatomoya There's no ABI break. We will backport both changes to the iron and humble branches

@MiguelCompany
Copy link
Collaborator

@fujitatomoya Changes have been backported to all the supported Fast DDS branches, so building ROS 2 from sources will have this issue fixed.

We have already released Fast DDS v2.6.6 (for humble), and will release v2.11.2 (rolling), and v2.10.2 (iron) this week or next.

@ZhenshengLee
Copy link

We have already released Fast DDS v2.6.6 (for humble), and will release v2.11.2 (rolling), and v2.10.2 (iron) this week or next.

@MiguelCompany Any possibility of backportting this to v2.0.x(foxy) which is the last LTS ros2 version?

Thanks.

@clalancette
Copy link
Contributor

@MiguelCompany Any possibility of backportting this to v2.0.x(foxy) which is the last LTS ros2 version?

Unfortunately, no. Foxy is now End-of-Life, so we have no plans to do updates to it.

The good news is that Humble is the next LTS, which should have these fixes in it.

@MiguelCompany
Copy link
Collaborator

@fujitatomoya Patches to Fast DDS were merged, backported, and released. Do you think we can close this issue?

@fujitatomoya
Copy link
Collaborator Author

@MiguelCompany yeah, right. i will go ahead to close this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants