Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate using the OVAL data for ubuntu provider #24

Open
wagoodman opened this issue Dec 19, 2022 · 2 comments
Open

Investigate using the OVAL data for ubuntu provider #24

wagoodman opened this issue Dec 19, 2022 · 2 comments
Labels
performance CPU or memory resource utilization of a provider provider:ubuntu Relating to the Ubuntu provider refactor highlights the need for a refactor

Comments

@wagoodman
Copy link
Contributor

wagoodman commented Dec 19, 2022

Today we parse the CVE information for ubuntu distributions from git://git.launchpad.net/ubuntu-cve-tracker . This is probably correct for unsupported distro versions, but for supported distro versions we should be leveraging the OVAL data https://security-metadata.canonical.com/oval/ . Searching through the git history for merging record changes is a slow process (hours with the current implementation), so if we could find ways to improve this section of the code or eliminate the need altogether that would be ideal.

More investigation is needed to understand:

  • where the bottle necks with the current implementation are today
  • can the hot spots be refactored to alleviate time and resource pains?
  • what the OVAL data has or doesn't have over the current git cve-tracker repo
@wagoodman wagoodman added refactor highlights the need for a refactor performance CPU or memory resource utilization of a provider provider:ubuntu Relating to the Ubuntu provider labels Dec 19, 2022
@wagoodman
Copy link
Contributor Author

After further investigation it doesn't make sense to use the OVAL data provided by canonical since it is only available for "supported OSs" (https://wiki.ubuntu.com/Releases) which omits several distro versions we scan today. This means that we would still need to do all of the processing that we do today in order to support older distro versions anyway.

As for performance, I haven't been able to run this provider to conclusion after running for several hours (for some days). After digging it looks like the main performance bottleneck is the revision history search... here are some suggested improvements:

  • implement concurrent workers (4 to start with). The git log call is out-of-process implying that the process is currently IO bound. Since these are read-only calls we don't need to worry about filesystem locks or internal git locks... this should scale nicely using simple threads for all _merge_cve() calls.
  • remove --follow from the git log command. Given that git already tracks renames without --follow within the same commit and the nature of the process is to move files within the same commit, this appears safe. This should be verified.
  • after cloning or fetching, I've added a git commit-graph write to optimize the git log calls (see https://git-scm.com/docs/git-commit-graph). This should drop the average git log call from ~15 seconds to ~1 second.

@wagoodman
Copy link
Contributor Author

Tried out all three suggestions, there are some improvements, however, @westonsteimel found several instances that ultimately required the --follow flag on the git log commands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance CPU or memory resource utilization of a provider provider:ubuntu Relating to the Ubuntu provider refactor highlights the need for a refactor
Projects
None yet
Development

No branches or pull requests

1 participant