Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

many: deal with incorrect status from osbuild better #827

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

mvo5
Copy link
Collaborator

@mvo5 mvo5 commented Feb 7, 2025

go.mod: upadate to latest images to get PR#1200

This commit updates the images library to pull in the fix [0] for
the overly long messages from osbuild. This should test the
(previously) failing test that runs the centos-9 ISO with an
attached terminal.

[0] osbuild/images#1200


test: run the centos-9 test with an attached terminal

This commit changes the centos-9 test to run with podman -t so
that we have a test-case that uses the terminal progress. This
is prompted by:
a) we have no integration test currently that uses the terminal
progress for the full build
b) a konflux failure/memory leak that showed because there the
test is run with -t


This commit ensures that we do restrict the memory of the bib
test conatiner to catch excessive memory usage. This is prompted
by a memory leak when dealing with unrecoverable status messages
that lead to failures in konflux.


This commit changes the progress parser (again) to deal with
errors from the osbuild json progress scanner. On errors we
will now exit right away and potentially kill osbuild but
provide an error message that hints how to workaround the
issue.

The original code assumed we get transient errors like json
decode issues. However it turns out that this is not always
the case, some errors from a bufio.Scanner are unrecoverable
(like ErrTooBig) and trying again just leads to an endless
loop. We can also not "break" wait for the build to finish
because that would appear like the progress is broken
forever and we would still have to report an error (just
much later).

mvo5 added 2 commits February 7, 2025 13:19
This commit changes the progress parser (again) to deal with
errors from the osbuild json progress scanner. On errors we
will now exit right away and potentially kill osbuild but
provide an error message that hints how to workaround the
issue.

The original code assumed we get transient errors like json
decode issues. However it turns out that this is not always
the case, some errors from a bufio.Scanner are unrecoverable
(like ErrTooBig) and trying again just leads to an endless
loop. We can also not "break" wait for the build to finish
because that would appear like the progress is broken
forever and we would still have to report an error (just
much later).
This commit ensures that we do restrict the memory of the bib
test conatiner to catch excessive memory usage. This is prompted
by a memory leak when dealing with unrecoverable status messages
that lead to failures in konflux.
defer func() {
// ensure osbuild is stopped even if we exit early
if cmd.Process != nil {
cmd.Process.Kill()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may need a bit more care, maybe only do it in hte error case (as the happy case does a cmd.Wait already so no killing needed)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine as is but I would strongly recommend using PR_SET_PDEATHSIG if it's not already in use. That's a kernel-enforced way to lifecycle bind the child to the parent, such that if e.g. the parent dies due to SIGSEGV or whatever, the child gets killed too.

This commit changes the centos-9 test to run with `podman -t` so
that we have a test-case that uses the `terminal` progress. This
is prompted by:
a) we have no integration test currently that uses the terminal
   progress for the full build
b) a konflux failure/memory leak that showed because there the
   test is run with `-t`
@mvo5 mvo5 marked this pull request as ready for review February 7, 2025 21:05
// can also not (in the general case) recover as
// the underlying osbuildStatus.scanner maybe in
// an unrecoverable state (like ErrTooBig).
return fmt.Errorf(`error parsing osbuild status, please eport a bug and try with "--progress=verbose": %w`, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just eporting a typo
s/eport/report/

This commit updates the images library to pull in the fix [0] for
the overly long messages from osbuild. This should test the
(previously) failing test that runs the centos-9 ISO with an
attached terminal.

[0] osbuild/images#1200
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants