Handle msg not found #2384

12tbuchman · 2025-01-17T22:06:10Z

No description provided.

Add two new roslaunch substitution args, param and optparam. These function similarily to env and optenv, but read from what is in the param server at the time of argument substitution. This means the params for these substitutions are read *prior* to whatever ros params are set in the launch file. This is useful for storing global configuration params in the server by some prior launch file or process prior to launching various nodes that depend on those configurations. This mechanism is very useful to alter launch file behavior based on sensor configuration or other global configuration, like sim time being enabled.

Currently, by default, roscpp sets the nodelay hint to true, while rospy sets its to false. This is inconsistent. Fix the inconsistency by having rospy follow roscpp. It makes little sense to add any delay with ROS, as ROS is a message based system. It is generally better to send a message as soon as it is available.

Invoking rosbag to record bagfiles and asking it to split those bagfiles at certain thresholds will cause the following (relevant) events to occur: 1. Rosbag subscribes to all topics specified 2. Record data 3. Threshold hit 1a. Stop writing to file 2a. Create new file 3a. Start writing to new file 4a. Spin (1a to 3a) We have a specific need to only retain a subset of these split files. If that subset of files does not include the first (0th) file, then any topics that are static will get lost (because they are only recorded upon initial subscription, step 1 above). With this change, you can now specify (via regex) a set of topics that should be re-subscribed to during a split. This would allow these latched topics to be available in every split file.

This is the first of several steps to simplify time tracking in rosbag.

We still convert _duration_ to wall time to figure out how long to sleep but we no longer need to keep track of an absolute wall-clock time which in turn means that we do not need to track how long (in wall-clock time) we've been sleeping while paused etc.

When the wall-clock horizon was removed the publish rate was skewed because I forgot to take into account the run time of the code between calls to runClock(). To fully correct this, the TimePublisher needed to be aware of when we were paused. Since pausing the clock seems very much relevant to TimePublisher, I moved the handling and tracking of "paused" into TimePublisher.

This recv is done on the thread that handles accepting new connections to this node. If for some reason the client fails to send the header for an extended period of time it will prevent any other connections from being made to this node. If a client hasn't sent the ros header within the timeout we will drop there connection and move on. In most cases the client will retry the connection.

If we hit one of the cases where we get an exception during the connection sequence and a reconnection is desired then we need to close the socket and set it to None. This will allow the connect loop in robust_connect_subscriber() to continue for another iteration. Without these changes the connect loop would exit because conn.socket was not None and then enter the receive loop without successfully having negotiated a connection. This can cause a deadlock between a publisher and subscriber if the connection was established but the ros header was not written out. This will leave the publisher trying to receive the ros header from the subscriber and the subscriber waiting to receive a message from the publisher. When this case is hit, for whatever reason, it's best to just close the current connection and establish a new connection.

If the connection is reset for some reason we want to attempt to reconnect. If the connection has actually gone away then we will fail the next attemp with ECONNREFUSED and stop trying but if not we may be able to reconnect.

Long-running callbacks in rospy can cause extreme amounts of buffering resulting in unnecessary delay, essentially ignoring the queue_size setting (ros#1901). This can already be somewhat mitigated by setting buff_size to be larger than the amount of data that could be buffered by a long running callback. However, setting buff_size to a correct value is not possible for the user of the API if the amount of time in the callback or the amount of data that would be transmitted is unknown. Also, even with a correct buff_size and a queue_size of 1, the received data may still be the oldest of all data transmitted while the callback was running. Fix the delays in such cases by running callbacks in a separate thread. The receive_loop then calls recv() concurrently with the long running callback, enforcing queue_size as new data is received. This fixes the latency in the data when queue_size is set to be similar to roscpp. This fixes ros#1901

This bug is severe because it can break ordinary rospy.get_param() after a rospy.get_param_cached() has been perfomed. And, furthermore, rospy.log*() does a rospy.get_param_cached()! When a parameter is set in the cache, the cached value will always be returned, even when the caller uses ordinary rospy.get_param(). This is as designed, as subscribeParam() is done to make the cache receive updates from the param server whenever the value changes. However, existing code only uses the presence of a key in the cache to indicate whether the cached param is valid. However, this is not enough. Suppose the param server has these contents: top: min: 1 max: 10 print(rospy.get_param('/top')) # So far, so good. Output is: # {'min': 1, 'max': 10} print(rospy.get_param_cached('/top/min')) # As expected, this outputs: # 1 # However, ParamServerCache now contains: # {'top': {'min': 1}} # This is correct, because only the value of 'min' is cached. print(rospy.get_param('/top')) # Incorrect output: # {'min': 1} # Ugh! 'max': 10 is missing! # ...because it looks like /top is in the cache, but it was never # actually cached; only the individual item within it, 'min', was # cached. The fix is to track the actual keys that are cached in a Python set(). Only return a cached value if either the key or a parent of it is in the set. This was found by production code which attempts to get all params using rospy.get_param('/'). This works OK as long as no calls to rospy.get_param_cached() have been made. However, the rospy.log*() functions all do rospy.get_param_cached('rosout_disable_topics_generation'). So after the first loginfo, subsequent calls to rospy.get_param('/') return only the single cached key rather than all params. Fixes: d16296a ("rospy: added get_param_cached (ros#1515)")

If rospy.get_param_cached() is called with a non-existent parameter, existing code would attempt to encode that the parameter doesn't exist by storing its value in the cache as an empty dictionary. And it would correctly make a subscribeParam call to rosmaster to receive updates. However, every subsequent .get_param_cached() on this non-existent parameter would re-subscribe for updates from rosmaster, nullifying any performance benefit from caching. Likewise, a regular .get_param() would simply re-read the parameter from rosmaster every time, still not benefitting from caching. To make matters worse, the rospy.log* functions do a rospy.get_param_cached('rosout_disable_topics_generation'), so we pay this penalty on every log message whenever that paremeter is not set (it is not set by default!). The cause of this problem is that the translation of an empty dict in the cache to a KeyError occurs within the cache itself, so the caller can't know whether the parameter is simply unset vs. not subscribed for updates. The fix is simple: Move the check for empty dict from the cache itself to the caller. Fixes: d16296a ("rospy: added get_param_cached (ros#1515)")

This bug affected latched topics that were published more than once per recording session, and only when the `--repeat_latched` option was set. Latched messages were added to a std::map object using the `insert` call. This call adds a new element to a map only if one does not yet exist for that key. This causes the first latched messages to be the only ones recorded at the start of each split bagfile, not the most recent messages.

Because our ROS system is entirely contained to a local network, we do not need tcp_nodelay to be "false". Also worth mentioning, since we are using upstream code for some of our nodes, we don't have the option to go change the flags passed to all the subscribers. Instead we change the default here.

When omitted from the subscriber the queue size is None, which has the potential to cause the subscribers to buffer without bound. This can be avoided by setting the queue size to 100 whenever it's omitted or None. ROS-658

python 3.10 collections abstract base classes are now under collections.abc in python 3.10.

roslaunch when waiting for processes to exit eats a lot of resources waking up every 0.1 seconds to check on things. Change this to a 1.0 second sleep. The resources used to wake add up. For our robot, this is about 5% of a CPU core wasted.

If a "required" node in a launch file goes down, we really want to pass the error code up through the system; specifically to systemd. We can use this information to do things like attempt restarts. This was an overlooked feature of roslaunch, which always returns zero if nodes die.

roscore will only clean up a node if all of its subscriptions have been unregistered. This means that any rospy node that uses roslog would not get cleaned up by roscore. If the node is then restarted it will attempt to send a shutdown to the previous node because it shared the same node, even though it had actually shutdown. To complicate this further since the original node shutdown another node might have been assigned the same address and then it would randomly receive the shutdown request even though it didn't actually conflict. These changes will ensure that we unregister any param that we had subscribed for. Now the rospy nodes will actually get properly cleaned up in roscore.

This exception currently crashes rosbag but can easily be handled with the message skipped and a readable warning published which indicates the topic causing the exception.

C. Andy Martin and others added 22 commits June 8, 2021 14:40

Remove dual-tracking of the wall-clock horizon

8e6202b

This is the first of several steps to simplify time tracking in rosbag.

Delete the no-longer-needed TimeTranslator

f4b306c

Add feature to increase and decrease the playback rate on-the-fly

c9ea68d

rospy: attempt reconnect on ECONNRESET

0ab7820

If the connection is reset for some reason we want to attempt to reconnect. If the connection has actually gone away then we will fail the next attemp with ECONNREFUSED and stop trying but if not we may be able to reconnect.

Set queue size for rospy subscriber to 100 when None

d07cc62

When omitted from the subscriber the queue size is None, which has the potential to cause the subscribers to buffer without bound. This can be avoided by setting the queue size to 100 whenever it's omitted or None. ROS-658

Fix python3.10 collections

46f9ef6

python 3.10 collections abstract base classes are now under collections.abc in python 3.10.

pmon: change sleep to 1.0 seconds from 0.1

328e1de

roslaunch when waiting for processes to exit eats a lot of resources waking up every 0.1 seconds to check on things. Change this to a 1.0 second sleep. The resources used to wake add up. For our robot, this is about 5% of a CPU core wasted.

Handle genmsg.msg_loader MsgNotFound exception

8e9616c

This exception currently crashes rosbag but can easily be handled with the message skipped and a readable warning published which indicates the topic causing the exception.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle msg not found #2384

Handle msg not found #2384

12tbuchman commented Jan 17, 2025

Handle msg not found #2384

Are you sure you want to change the base?

Handle msg not found #2384

Conversation

12tbuchman commented Jan 17, 2025