Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata format adds complexity for little benefits #23

Open
inconstante opened this issue Aug 12, 2021 · 9 comments
Open

Metadata format adds complexity for little benefits #23

inconstante opened this issue Aug 12, 2021 · 9 comments

Comments

@inconstante
Copy link
Contributor

Currently, the metadata file is in binary format.

Strings in the metadata file are not null-terminated and there are no lines, thus no end-of-line characters (or sequences of characters). As far as I can tell, the only benefit that it adds relies on the fact that end-of-line characters vary across platforms (or even across text editors), which could lead to broken parsing if Libpulp did not handle varying styles of newlines properly.

The benefit comes at a cost in complexity:

  • It makes the metadata file harder for humans to read;
  • It adds the need for the dump tool, which adds maintenance burden;
  • It requires length fields in the metadata (strings not being null-terminated);

Apart from that, the packer tool, which is responsible for recording build-id and checksum information into the metadata file, also converts from text to binary format. If a text format was used, the packer tool would still be useful to handle build-ids and checksums, however it would allow users to handle them manually, with a regular text editor (on quick tests, for example).

In my opinion, the drawbacks of the binary format outweigh the benefits.

@susematz
Copy link
Collaborator

FWIW, I agree.

@a-gavin
Copy link
Contributor

a-gavin commented Aug 12, 2021

I agree that having a specific binary format metadata file adds unnecessary complexity.

If I may, I'd like to suggest an alternate approach from the perspective of a new user. Is it necessary to have ulp_packer as a separate tool?

That I am aware of, ulp_packer simply prepares and validates the patch for later usage in ulp_trigger and ulp_check, saving a metadata file in the process. The metadata file, that I can tell, contains user-specified description file information in addition to data read from the to-be-patched library's ELF file and a MD4 digest of the patch.

Why not instead embed the preparation/validation logic of ulp_packer in common functions used by both ulp_trigger and ulp_check, and have both take as input the user-defined description file? This would eliminate the need for ulp_packer, ulp_dump, and the metadata file at the cost of added complexity to ulp_trigger and ulp_check.

A couple drawbacks I see with this approach:

  • Redundant computation performed when a user wants to verify the patch with ulp_check. I'm unsure what the overhead is for reading the ELF files of large programs, but we could eliminate this by making ulp_trigger output a textual metadata format of the applied patch (as discussed) for usage in ulp_check. This approach would complexity to ulp_trigger and potential command line argument bloat, though.

  • Breakage of any existing tooling built around the existing ULP interface.

@inconstante
Copy link
Contributor Author

inconstante commented Aug 13, 2021

That I am aware of, ulp_packer simply prepares and validates the patch for later usage in ulp_trigger and ulp_check, saving a metadata file in the process. The metadata file, that I can tell, contains user-specified description file information in addition to data read from the to-be-patched library's ELF file and a MD4 digest of the patch.

Yes, the metadata file is based on the description file, which is not only human readable, but written by-hand. That file is fed to ulp_packer, which creates the final metadata file.

As a matter of fact, in the test suite, the metadata generation process is even longer:

  1. First of all, a human writes a template description file, such as libaddress_livepatch1.in;
  2. Then, the build system calls a bunch of scripts to replace macros, such as __ABS_BUILDDIR__, and records the output into an actual description file.
  3. Finally, ulp_packer converts the description file to binary format, as well as adds the missing information (build-id and digest).

If we switch from binary to text files, we still have to record the build-id and the digest into the metadata. There are at least two ways to do it:

  1. Write the build-id and digest into the description file by-hand.
  2. Let ulp_packer do it for us.

Why not instead embed the preparation/validation logic of ulp_packer in common functions used by both ulp_trigger and ulp_check and have both take as input the user-defined description file?

One of the goals of having the build-id and digest in the metadata file is to make sure that the metadata file and the live patch DSO match. Suppose you have a metadata file, but somehow you got the wrong live patch DSO. In this case, ulp_trigger will notice the mismatch, and refrain from applying the live patch at all. The build-id and digest create a correspondence between the two files that compose a live patch.

In other words, ulp_packer is used during live patch creation time to record this correspondence into the metadata file. Then, during live patch application time, ulp_trigger uses that information to verify the match.

I think ulp_trigger is the wrong place to put that. Likewise for ulp_check.

That said, I still think we shouldn't get rid of ulp_packer, because it automatically writes the build-id and digest into the metadata file.

* Breakage of any existing tooling built around the existing ULP interface.

Good point, but, for now, we don't care too much about it, specially because libpulp has never been released. :)

@inconstante
Copy link
Contributor Author

@a-gavin

Pull request #19 (just merged) affects this, because it adds more lines to the metadata file (lines starting with #), as well as some code to ulp_packer and lib/ulp.c to deal with them.

@a-gavin
Copy link
Contributor

a-gavin commented Aug 17, 2021

Yes, the metadata file is based on the description file, which is not only human readable, but written by-hand. That file is fed to ulp_packer, which creates the final metadata file.

As a matter of fact, in the test suite, the metadata generation process is even longer:

1. First of all, a human writes a _template_ description file, such as [libaddress_livepatch1.in](https://github.com/SUSE/libpulp/blob/master/tests/libaddress_livepatch1.in);

2. Then, the build system calls a bunch of scripts to replace macros, such as `__ABS_BUILDDIR__`, and records the output into an actual _description file_.

3. Finally, `ulp_packer` converts the description file to binary format, as well as adds the missing information (build-id and digest).

Good to know.

If we switch from binary to text files, we still have to record the build-id and the digest into the metadata. There are at least two ways to do it:

1. Write the build-id and digest into the description file **by-hand**.

2. Let ulp_packer do it for us.

One of the goals of having the build-id and digest in the metadata file is to make sure that the metadata file and the live patch DSO match. Suppose you have a metadata file, but somehow you got the wrong live patch DSO. In this case, ulp_trigger will notice the mismatch, and refrain from applying the live patch at all. The build-id and digest create a correspondence between the two files that compose a live patch.

In other words, ulp_packer is used during live patch creation time to record this correspondence into the metadata file. Then, during live patch application time, ulp_trigger uses that information to verify the match.

This makes sense. Easy to overlook the complexity involved in the process of generating and applying a live patch!

I think ulp_trigger is the wrong place to put that. Likewise for ulp_check.

Can you elaborate? From what I understand, it makes sense for ulp_trigger to verify the match if it is going to apply the live patch.

That said, I still think we shouldn't get rid of ulp_packer, because it automatically writes the build-id and digest into the metadata file.

Sure. Easier on the user to have a tool do it for them.

Once I submit PR's for #25 and #26, I'd be happy to work on this!

@inconstante
Copy link
Contributor Author

I think ulp_trigger is the wrong place to put that. Likewise for ulp_check.

Can you elaborate? From what I understand, it makes sense for ulp_trigger to verify the match if it is going to apply the live patch.

What I mean is that ulp_packer is the right place to give the final touches to the metadata file (i.e.: to start from a partially filled metadata file template, e.g. libfoo_livepatch1.in, find the build-id of the target library, and record it into an actual metadata file, e.g. libfoo_livepatch1.ulp); whereas ulp_trigger (as well as ulp_check) is the right place to check that the metadata file and the live patch object, e.g. libfoo_livepatch1.so, match.

So, ulp_packer creates the metadata file and ulp_trigger consumes it.

Once I submit PR's for #25 and #26, I'd be happy to work on this!

:)))))

@giulianobelinassi
Copy link
Collaborator

Metadata is now integrated into the livepatch container (.so).

I think now getting rid of the binary metadata format is pointless: we now need a dump tool more than ever, and the binary format keep size of things determined, so we do not need to carry the size of the metadata when it is loaded in memory.

What you guys think?

@susematz
Copy link
Collaborator

I agree to that, though even binary meta-data can be complicated and not so complicated :-) Our metadata now (in the .so) has a determined size (from the section it's contained in), so, e.g. all strings therein could be null-terminated as usual in C, not size-prefixed. Overreads can be avoided as the overall size is known.

And, of course, the meta-data in the .so could also just as well be text based as well, so that the "dump" tool would be trivial:
readelf -p .comment.ulp (or similar) :-)

@giulianobelinassi
Copy link
Collaborator

giulianobelinassi commented Mar 31, 2022

The thing is that metadata extraction is not done in libpulp.so: it is now done in ulp and then ptraced to libpulp.so into a statically allocated buffer. We do that because we do not want to carry libelf with libpulp. We only check if the metadata will fit into the buffer on ulp side (buffer is 512k, which is plenty atm), and since the metadata itself has the size of it, we do not have to worry about passing its size around.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants