Skip to content

add chainguard osv feed#1205

Open
crosleyzack wants to merge 7 commits into
anchore:mainfrom
crosleyzack:crosley/add-chainguard-osv-feed
Open

add chainguard osv feed#1205
crosleyzack wants to merge 7 commits into
anchore:mainfrom
crosleyzack:crosley/add-chainguard-osv-feed

Conversation

@crosleyzack

@crosleyzack crosleyzack commented May 27, 2026

Copy link
Copy Markdown
Contributor

What

Adds the ability for the chainguard provider to consume the Chainguard OSV feed in addition to the chainguard secdb feed

Why

We are trying to migrate to OSV due to the additional context and fidelity it provides. The SecDB feed is slated to be deprecated end of this year (2026)

Notes

The quality tests passed, which is to be expected as no behavior should have changed.

✦2 ❯ make validate provider=chainguard
uv run yardstick validate --result-set pr_vs_latest_via_sbom_chainguard
Loading label entries...done! 23 entries loaded
Validating with 'pr_vs_latest_via_sbom_chainguard'
2026-05-27 16:47:07,206 [INFO] only considering matches from allowed namespaces: chainguard:distro:chainguard:rolling
2026-05-27 16:47:07,206 [INFO] Testing image: 'ghcr.io/chainguard-images/scanner-test@sha256:59bddc101fba0c45d5c093575c6bc5bfee7f0e46ff127e6bb4e5acaaafb525f9' with 'syft@latest, grype@main+import-db=build/vulnerability.db, grype@main+import-db=https://grype.anchore.io/databases/v6/vulnerability-db_v6.1.4_2026-05-13T00:47:21Z_1778657102.tar.zst'
   Results used for image ghcr.io/chainguard-images/scanner-test@sha256:59bddc101fba0c45d5c093575c6bc5bfee7f0e46ff127e6bb4e5acaaafb525f9:
    ├── c9988b12-2a36-4fb4-9490-db690002220d : grype[custom-db]@v0.99.1-14-g8a04a093 (custom-db)  against ghcr.io/chainguard-images/scanner-test@sha256:59bddc101fba0c45d5c093575c6bc5bfee7f0e46ff127e6bb4e5acaaafb525f9
    └── ddec88a3-940d-47ec-b954-8541a59680e3 : grype[reference]@v0.99.1-14-g8a04a093 (reference)  against ghcr.io/chainguard-images/scanner-test@sha256:59bddc101fba0c45d5c093575c6bc5bfee7f0e46ff127e6bb4e5acaaafb525f9
--------------------------------------------------------------------------------

Quality gate passed!

Succesfully ran VUNNEL_PROVIDERS_CHAINGUARD_USE_OSV=true VUNNEL_ROOT=data_osv uv run vunnel run chainguard and GRYPE_DB_PROVIDER_ROOT=../vunnel/data_osv GRYPE_DB_BUILD_DIR=./db_osv/6 go run ./cmd/grype-db build --schema 6 to create grype-db

Part of Chainguard issue INT-520

@crosleyzack crosleyzack marked this pull request as draft May 27, 2026 16:58
Comment thread src/vunnel/providers/wolfi/parser.py Outdated
Comment thread src/vunnel/providers/wolfi/parser.py Outdated
Comment thread src/vunnel/providers/wolfi/parser.py Outdated
@crosleyzack crosleyzack force-pushed the crosley/add-chainguard-osv-feed branch from 797b578 to 2c2b429 Compare May 27, 2026 17:10
Comment on lines +33 to 36
# NOTE: schema and distribution version are actually set on init depending
# on which feed we configure the provider to use.
__schema__ = schema.OSSchema()
__distribution_version__ = int(__schema__.major_version)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a problem if this always reports OSSchema? I ensured the argument to writer.write below is the actual, correct schema

@crosleyzack crosleyzack force-pushed the crosley/add-chainguard-osv-feed branch 2 times, most recently from d620923 to 8ac7bbf Compare May 28, 2026 13:25
@crosleyzack

Copy link
Copy Markdown
Contributor Author

This appears to be a dupe of a PR by Ville updating the vunnel provider. Humbly, I think creating a new parser is much cleaner as an approach though

@crosleyzack crosleyzack marked this pull request as ready for review June 2, 2026 17:08
@crosleyzack crosleyzack force-pushed the crosley/add-chainguard-osv-feed branch from 8ac7bbf to 52af1d3 Compare June 3, 2026 13:38
@crosleyzack

Copy link
Copy Markdown
Contributor Author

VUNNEL_PROVIDERS_CHAINGUARD_USE_OSV=true VUNNEL_ROOT=data_osv uv run vunnel run chainguard successfully creates vunnel output
GRYPE_DB_PROVIDER_ROOT=../vunnel/data_osv GRYPE_DB_BUILD_DIR=./db_osv/6 go run ./cmd/grype-db build --schema 6 successfully creates grype-db

@crosleyzack crosleyzack force-pushed the crosley/add-chainguard-osv-feed branch from 52af1d3 to 2a887d5 Compare June 10, 2026 01:20
Comment thread src/vunnel/providers/wolfi/parser.py Outdated
Comment thread src/vunnel/providers/wolfi/parser.py Outdated
Comment thread src/vunnel/providers/wolfi/parser.py Outdated
# which should speed up the download process significantly since there are thousands of entries.
# We construct the URL for each entry by appending the entry ID and .json to the base URL
# e.g. https://packages.cgr.dev/chainguard/v2/osv/CGA-2255-2h2p-73q2.json
with concurrent.futures.ThreadPoolExecutor(max_workers=self.max_workers) as executor:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you all opening to publishing a zip / targz of the whole set? Some providers do that and it saves a ton of network connections.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A common pattern is basically like snapshot.tar.gz and changes.csv which is a list of things that changes since the snapshot, if it can't be regenerated.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not currently, but can look into that. Another coworker has messed with the multithreaded download and improved its performance substantially

Comment thread src/vunnel/providers/wolfi/parser.py
@crosleyzack crosleyzack force-pushed the crosley/add-chainguard-osv-feed branch from b435fe8 to 1df5f76 Compare June 10, 2026 18:06
provider

Signed-off-by: crosleyzack <mail@crosleyzack.com>
Signed-off-by: Zackary Crosley <zackary.crosley@chainguard.dev>
Signed-off-by: crosleyzack <mail@crosleyzack.com>
Signed-off-by: Zackary Crosley <zackary.crosley@chainguard.dev>
Signed-off-by: crosleyzack <mail@crosleyzack.com>
Signed-off-by: Zackary Crosley <zackary.crosley@chainguard.dev>
Signed-off-by: crosleyzack <mail@crosleyzack.com>
Signed-off-by: Zackary Crosley <zackary.crosley@chainguard.dev>
Signed-off-by: crosleyzack <mail@crosleyzack.com>
Signed-off-by: Zackary Crosley <zackary.crosley@chainguard.dev>
@crosleyzack crosleyzack force-pushed the crosley/add-chainguard-osv-feed branch from 1df5f76 to 3795fec Compare June 17, 2026 18:27
@oss-housekeeper oss-housekeeper Bot added the needs-manual-review automated action that should be reviewed by a human label Jun 17, 2026

@oss-housekeeper oss-housekeeper Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

Changes github configuration or dependencies — this requires manual review from @anchore/tools

No action from the PR author is needed.

Guarded files touched in this PR:

  • .github/scripts/dev-shell.sh (sha: ffc651d4e069a6d2ba96f750e0f31c87b2da595a)

This review disposition can be dismissed after manual review.

@crosleyzack crosleyzack force-pushed the crosley/add-chainguard-osv-feed branch from 3795fec to 258465e Compare June 18, 2026 14:50
Signed-off-by: Zackary Crosley <zackary.crosley@chainguard.dev>
@crosleyzack crosleyzack force-pushed the crosley/add-chainguard-osv-feed branch from 258465e to 98e6ed4 Compare June 18, 2026 15:09
Signed-off-by: Zackary Crosley <zackary.crosley@chainguard.dev>
try:
self.logger.info(f"downloading {self.namespace} osv {self.url}")
r = http.get(self.url, self.logger, stream=True, timeout=self.download_timeout)
buf = io.BytesIO(r.content)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This downloads via stream and then immediately reads the whole stream. I think you probably want to stream to the file like

for chunk in response.iter_content(chunk_size=65536): # 64k chunks

return orjson.loads(fh.read())

with concurrent.futures.ThreadPoolExecutor(max_workers=self.max_workers) as executor:
for data in executor.map(_load_file, filenames):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think map is eager, see https://docs.python.org/3.13/library/concurrent.futures.html#concurrent.futures.Executor.map , and this results in a slot in the map per file available being eagerly loaded. Maybe we don't need concurrency here any more since this is local I/O? Or maybe a pattern can limit the concurrency?

Honestly I think we can just skip extracting and yield from a decompression string like with tarfile.open(self._archive_path, mode="r|gz") as tf:.

I think the eager map is making this really slow - local testing shows a naive stream from tar.gz on disk is much faster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-manual-review automated action that should be reviewed by a human

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants