Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Null bytes in protobuf messages sent via unix datagram socket #2

Open
koalo opened this issue Feb 24, 2023 · 3 comments
Open

Null bytes in protobuf messages sent via unix datagram socket #2

koalo opened this issue Feb 24, 2023 · 3 comments

Comments

@koalo
Copy link
Collaborator

koalo commented Feb 24, 2023

Hi,
I just noticed that when the service sends a StreamQosResponse with ok = False, an empty packet is received on receiver side.
The reason for that is that protobuf messages are allowed to contain null bytes (and a False boolean is encoded as null byte \0).
Unfortunately, that is also the terminator of a datagram. In the current implementation that probably got unnoticed, because if ok = False the subsequent fields are ignored anyway. However, I still think we should avoid sending raw protobuf messages via unix datagram sockets.

I am happy to provide a fix, but since there are many different options to solve this, I would appreciate your opinion on this topic:

  1. We could further encode the protobuf message to prevent null bytes (e.g. as Base64).
  2. We could avoid using datagram sockets and use stream sockets instead. We would then need a method for separating the messages in the stream again.
  3. We completely avoid protobuf and use another IPC method. One option that you already mentioned in the README would be to replace it by DBus.

What do you think?

@xtor
Copy link
Collaborator

xtor commented Feb 25, 2023

Hi!

Ey, good catch! Thanks a lot for your brilliant assessment and proposals to improve the IPC interface!

I would +1 the D-BUS choice. Some comments below.

  • Linux integration

    • @vcgomes and @shifty91 also suggested the usage of D-BUS in the past. That would clearly improve the Linux integration, add robustness, and probably benefit from better CVE fix backports in the upstream distributions.
    • For x86 controllers, we could easily live up with any runtime overhead. But we should definitely keep an eye on the impact for more constrained Linux-based devices. We could always introduce a more modular IPC interface if users require it.
  • Built-in SCM_RIGHTS

    • Another plus -if I am not mistaken- would be the built-in D-BUS support to pass fds. The current interface performs the resource management and system configuration, and provides the caller with the necessary information to perform the AF_PACKET socket initialization.
    • But there is already demo code performing the initialization in the privileged entity, and returning the fd via SCM_RIGHTS. See test_service_add_talker_socket in test_service.py
      def test_service_add_talker_socket(self):
    • The intention would be to aim for something similar in the AF_XDP case, so we can introduce some privilege separation between resource management and user applications. @magnus-karlsson @garyloug is there any pointer to existing code that we could check in order to define the right API between the privileged entity and the user application when using AF_XDP?
    • I am not suggesting to address this with the initial port, of course :) First we should move to D-BUS, and then let us look into the new improvements possible.
  • Developer documentation

    • IMHO D-BUS XML interface definition is busier and more intrincate than protobuf's IDL. So it would be great to provide developer documentation generation, for human-readable IPC API docs.
    • Later we may easily add support to generate internal API and even a couple of diagrams built from the code. I have experimented with pandoc, pyreverse and plantuml and the result is ok.
  • ok vs status

    • The intention of the "ok" field was just a quick and dirty way to handle errors between privileged and user entities. So we can catch the error on the client side, and then look into the log for details.
    • For the programmatic interface, it may make sense to provide additional details about the failure to the client, e.g. with a numeric "status" field.
    • After the initial port, we may also look into this change.
  • Dependencies

    • I have a few docker-based build environments ready, and hope to get through the release process ASAP. By calling a bash function per distribution, they copy the code in your local repo to a container with the build deps, generate the deb, and then copy it to the host filesystem.
    • As they install the build-deps, it may make sense that we merge the build envs first, and then the D-BUS port, including the required changes to the docker-based build and deb packaging script.

Schönes Wochenende!

@magnus-karlsson
Copy link

There was a C-based example in samples/bpf/xdpsock_ctrl_proc.c and xdpsock_user.c in kernels prior to 6.0 (I removed it in that release). Do not know if that will help you. If you are looking for a higher level example, maybe @garylough has something?

@koalo
Copy link
Collaborator Author

koalo commented Mar 9, 2023

Thanks a lot for providing these details!
Sounds very reasonable to wait until your upcoming release before implementing it. I am looking forward to the release and will have a look again after that.

koalo added a commit to Linutronix/detnetctl that referenced this issue Sep 18, 2023
Do not keep the connection to detd open. This fixes two issues:
- In case detd restarts, the connection got dropped
- Due to Avnu/detd#2, the connection
  is in an invalid state after a ok = False was sent

Since the requests are comparatively rare, there is no advantage
of keeping the connection open.

Signed-off-by: Florian Kauer <[email protected]>
Reviewed-by: Kurt Kanzenbach <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants