Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to topics for state/observation(/actions?) to allow for faster communication #139

Open
bheijden opened this issue Jun 15, 2021 · 1 comment
Milestone

Comments

@bheijden
Copy link
Member

  • How to deal with custom processing node synchronization?
  • Can we step ROS as well, as mentioned here? Could pose a problem if we run multiple simulations in parallel.
@bheijden
Copy link
Member Author

bheijden commented Jun 16, 2021

On 15-6-2021 17:52, Bas van der Heijden wrote:

For the general idea behind the toolkit, please read the pdfs. But i
think you are also looking for a more "technical" description. So, I'll
try to summarize that with a couple of sentences.

Synchronization of actions and observations is vital for effective
reinforcement learning (RL), because policy updates are based on
state-action pairs and corresponding rewards. So if there is no
synchronization when the RL agent performs a policy update, the updates
will be incorrect.

Yes, I've never really understood what the roadmap of RL researchers was
wrt real-world applications because of this. The real world can't be
paused. Erik Schuitema in our lab worked in the direction of applying RL
to real systems and he managed with some tricks, and I assume the field
of RL has made quite some progress since then.

When the agent has chosen its action, and wants the simulator to "step"
it does not care for synchronization per se (there are some intricacies,
but they are easier to explain in words during the meeting). Hence,
during a "step" we wish to parallelize as much of the operations as
possible, while enforcing some form of synchronization in-between steps.

We are aware that ROS is an inherently distributed and asynchronous
system, and that message production is decoupled from its consumption
when using topics. We wish to exploit this characteristic to allow for
fast distributed training, however we must take care to bring back
synchronization at certain points in our code (ie: when the agent
performs an update). There might be other systems that could facilitate
this and can specifically provide synchronization at certain points.
However, another aim of this project is to allow users to easily switch
to the real robot (which often requires ROS anyway), and also allow
users to utilize common ROS packages during training in simulation (e.g.
inverse kinematics packages).

This reads like you have more of a data-dependency then an explicit
control-flow synchronisation requirement.

As in: the next iteration can only start when the data from the previous
one has all been gathered.

Till then certain entities must suspend their activities.

I don't believe we are looking for a lock-step system in the strictest
sense as the operations we wish to perform in parallel are not the same,
and we do not compare the output of these operations to identify faults.
Though, I still see a similarity in that we "know" a-priori that we
should receive new messages (ie: topics) in the places where we wish to
enforce synchronization. Meaning, we can create routines that check
whether our a-priori expectation of receiving these asynchronous topics
have been met, before continuing (i.e. we compare our a-proiri knowledge
with the actual outcome of an asynchronous process). This won't lead to
any problems in simulation, as we are anyway stepping it manually
(instead of running the simulator asynchronous as well).

lock-step is not only used as terminology in redundant systems, it's a
more generic term essentially implying the control flow of multiple
entities is synchronised (or externally controlled) based on some scheme.

In your current system, you seem to have achieved that by strict
ordering of operations (ie: when things are called) and by implicitly
assuming that completion of operations also means availability of data.
The lockstep seems to be between the producers and the consumers of data.

For now I believe it should be possible to restructure the system to
recognise the producers and consumers more directly and I'd probably
suggest to make the data-dependency more explicit without trying to
enforce a specific control flow from a central entity.

In other words: add "barriers" in the control flow which depend on the
dataflow, and allow entities to progress past barriers once all the data
they require is present (this could essentially be reactive programming,
and there are implementations of that available for ROS).

@AlexanderKeijzer AlexanderKeijzer added this to the New communication milestone Jul 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants