-
Notifications
You must be signed in to change notification settings - Fork 574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Syft cycloneDX: create sBOM data from source packages instead of binary packages (e.g. debian packages) ? #1700
Comments
I would be very interested in the background of the Note that I once requested CycloneDX support for specifying source information via externalReferences, but additional URLs won't allow to specify source references in a unique way (think about mirrors, .zip vs. .tar.gz link etc.). So we now prefer the cc: @wagoodman |
the following jq query is doing the job AFTER the syft scan in all layers, transforming the "binary based" CycloneDX to a "source based" CycloneDX:
Thanks to StackOverflow to help me implementing this query! |
@ericbl -- that's one heck of a jq command! (adding mental note to work on my jq chops... ). Let me see if I can answer a few questions.
We hesitated on adding this for a long time, specifically because the Syft supports multiple SBOM formats, and the goal is to allow for grype to interop with these SBOMs in a way where vulnerability matching will not differ just because you've decided to use a different SBOM format. We explored multiple options for both SPDX and CycloneDX to express a source package clearly for the purposes of vulnerability matching but also wanted to ensure that it was clear to the SBOM consumer that these source packages were not found to be installed. At the time the methods we explored couldn't check all the boxes (the boxes were roughly: a) be clear to the user what's being expressed, 2) be able to show what's installed vs upstream relationships, and 3) be interoperable with multiple formats). Grype also supports being able to perform vulnerability matching when only specifying a pURL or set of pURLs. This, combined with the other efforts, made me lean towards adding an out-of-spec qualifier onto the pURL.
All OS catalogers tend to have this feature: https://github.com/search?q=repo%3Aanchore%2Fsyft%20PURLQualifierUpstream&type=code (alpm, apk, dep, rpm).
Correct, no dispute here about the However, this did not fulfill the needs of what we're trying to convey, which is "here is the [binary] package we found, and this is the package which it came from (the source package)". A pURL representing the source package alone only answers half of what was needed, and providing multiple pURLs is confusing for something that should be used as an identity (so should be singular). |
thanks for your answer. another way would be indeed with a 2nd purl, but as you pointed out, it shall not be named "purl" since that one should be unique. But we could name it differently! I found the cycloneDX spec a bit unprecise of the discussion, I did not find any rule either "source purl" or "binary purl". |
thanks, I tried only on debian, npm, python, etc, but not yet on other linux distrib. I ll do asap that with alpine / apk. It means my jq command above is not correct and shall be even more complex with a regex to rebuild the purl! |
@ericbl, a while back, I requested a similar topic with the CycloneDX team. It was not about a source purl, but adding a specific type for external source references. The CycloneDX team however claimed that it's not easy/possible to distinct between "source" and "binary" references throughout all ecosystems: CycloneDX/specification#98. I guess the same arguments would apply on source purls, so I wouldn't expect this to happen soon... Also taking the point of @wagoodman into consideration, that an SBOM should express what is "installed" in an image, a "source purl" would somehow be inconsistent in the default SBOM. But still, the feature as requested by @ericbl here – adding a |
the extraction of the source purl differs from package manager.
so the upstream part is built from the "syft:metadata:originPackage" instead of from the "syft:metadata:source" with Debian. This means my proposed jq command above is wrong: I should parse the purl on the upstream part, and not considering the metadata that differ from package manager. Having a ' upstream' mode as proposed by Gernot would help us a lot and avoid getting crazy with jq :) |
There are two paths forward:
These aren't mutually exclusive, so both in theory could be done, but I'm interested in hearing folks thoughts on which might be more useful (or if there are any other ideas here). |
your 2nd path proposal seems a bit more complex. And how could I then filter out the packages listing binary information I am not interested with? |
as I wrote above, my jq query is specific to Debian and difficult to maintain.
My pipeline script is then:
|
There seem to be a couple paths forward here, although this isn't a priority at the moment we've promoted this to our backlog and we welcome pull requests and would be happy to help. |
tl;dnr: could syft offer an option to generare a cycloneDX sBOM for os packages by considering only the source (put in the upstream part, also in metadata:source and version from metadata:sourceVersion on some package manager) and not the binaries?
Hello,
let's start with the business background: every software delivered by our company need a proper clearance of open source software (OSS clearing).
each team must generate a sBOM and get all software component analyzed on the shared SW360 platform: components must be properly identified and the source code provided.
A dedicated team will go through the source code to check the licenses.
Each team can use the tool of its choice to create the component on sw360. Some even take the path of doing it manually.
In our team, we create software that will be eventually deployed as a container image (docker for now): we use debian bullseye slim as base image and our software can further packages either built from source, or from some package manager (debian, pip, npm, nuget) or depending of the language (go, python, nodejs, ruby, c#, etc)
Therefore, in my team, we want to use Syft to generate a CycloneDX BOM and eventually tranform it to get the components uploaded in our sw360.
Syft is already providing the list of licenses but this is unfortunately not considered (yet) in our process.
Considering debian packages, the internal team dealing with debian OS (let's call it DebT) insists of using only the source package and not the binary.
DebT start with the list of debian components with this command:
dpkg-query -f '${source:Package}|${source:Version}|${binary:Package}|${Version}\n' -W
DebT eventually only take ${source:Package}|${source:Version}
Currently, the syft command is however generating a cycloneDX bom based on the binaries. Source is sometimes set as metadata property and then attached to the upstream part in the purl. It is particularly true for libraries, generating duplicates component of the not lib variant (e.g. curl and libcurl both pointing to the same source)
I've seen this upstream= addition only for debian packages, not yet on other package providers.
This however create a purl with this upstream extension not defined in the standard:
https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst#deb
Let's take a real exemple This is one line from the dpkg-query above.
util-linux|2.36.1-8+deb11u1|bsdutils|1:2.36.1-8+deb11u1
DebT is only interessted by scanning the source file so consider this package as
name: util-linux
version: 2.36.1-8+deb11u1
Syft generates the following in the cycloneDX sBON:
Source and sourceVersion are set as properties, as well as in the upstream part.
For us, the correct package data would be
(according to purl spec, arch should be set as source when we speak about the source package)
We are working on our own transformation from the syft output, but I wonder if this could better be an special output from syft directly.
What do you think?
The text was updated successfully, but these errors were encountered: