add chainguard osv feed#1205
Conversation
797b578 to
2c2b429
Compare
| # NOTE: schema and distribution version are actually set on init depending | ||
| # on which feed we configure the provider to use. | ||
| __schema__ = schema.OSSchema() | ||
| __distribution_version__ = int(__schema__.major_version) |
There was a problem hiding this comment.
Is it a problem if this always reports OSSchema? I ensured the argument to writer.write below is the actual, correct schema
d620923 to
8ac7bbf
Compare
|
This appears to be a dupe of a PR by Ville updating the vunnel provider. Humbly, I think creating a new parser is much cleaner as an approach though |
8ac7bbf to
52af1d3
Compare
|
|
52af1d3 to
2a887d5
Compare
| # which should speed up the download process significantly since there are thousands of entries. | ||
| # We construct the URL for each entry by appending the entry ID and .json to the base URL | ||
| # e.g. https://packages.cgr.dev/chainguard/v2/osv/CGA-2255-2h2p-73q2.json | ||
| with concurrent.futures.ThreadPoolExecutor(max_workers=self.max_workers) as executor: |
There was a problem hiding this comment.
Are you all opening to publishing a zip / targz of the whole set? Some providers do that and it saves a ton of network connections.
There was a problem hiding this comment.
A common pattern is basically like snapshot.tar.gz and changes.csv which is a list of things that changes since the snapshot, if it can't be regenerated.
There was a problem hiding this comment.
We do not currently, but can look into that. Another coworker has messed with the multithreaded download and improved its performance substantially
b435fe8 to
1df5f76
Compare
provider Signed-off-by: crosleyzack <mail@crosleyzack.com> Signed-off-by: Zackary Crosley <zackary.crosley@chainguard.dev>
Signed-off-by: crosleyzack <mail@crosleyzack.com> Signed-off-by: Zackary Crosley <zackary.crosley@chainguard.dev>
Signed-off-by: crosleyzack <mail@crosleyzack.com> Signed-off-by: Zackary Crosley <zackary.crosley@chainguard.dev>
Signed-off-by: crosleyzack <mail@crosleyzack.com> Signed-off-by: Zackary Crosley <zackary.crosley@chainguard.dev>
Signed-off-by: crosleyzack <mail@crosleyzack.com> Signed-off-by: Zackary Crosley <zackary.crosley@chainguard.dev>
1df5f76 to
3795fec
Compare
There was a problem hiding this comment.
Warning
Changes github configuration or dependencies — this requires manual review from @anchore/tools
No action from the PR author is needed.
Guarded files touched in this PR:
.github/scripts/dev-shell.sh(sha:ffc651d4e069a6d2ba96f750e0f31c87b2da595a)
This review disposition can be dismissed after manual review.
3795fec to
258465e
Compare
Signed-off-by: Zackary Crosley <zackary.crosley@chainguard.dev>
258465e to
98e6ed4
Compare
Signed-off-by: Zackary Crosley <zackary.crosley@chainguard.dev>
| try: | ||
| self.logger.info(f"downloading {self.namespace} osv {self.url}") | ||
| r = http.get(self.url, self.logger, stream=True, timeout=self.download_timeout) | ||
| buf = io.BytesIO(r.content) |
There was a problem hiding this comment.
This downloads via stream and then immediately reads the whole stream. I think you probably want to stream to the file like
| return orjson.loads(fh.read()) | ||
|
|
||
| with concurrent.futures.ThreadPoolExecutor(max_workers=self.max_workers) as executor: | ||
| for data in executor.map(_load_file, filenames): |
There was a problem hiding this comment.
I think map is eager, see https://docs.python.org/3.13/library/concurrent.futures.html#concurrent.futures.Executor.map , and this results in a slot in the map per file available being eagerly loaded. Maybe we don't need concurrency here any more since this is local I/O? Or maybe a pattern can limit the concurrency?
Honestly I think we can just skip extracting and yield from a decompression string like with tarfile.open(self._archive_path, mode="r|gz") as tf:.
I think the eager map is making this really slow - local testing shows a naive stream from tar.gz on disk is much faster.
What
Adds the ability for the chainguard provider to consume the Chainguard OSV feed in addition to the chainguard secdb feed
Why
We are trying to migrate to OSV due to the additional context and fidelity it provides. The SecDB feed is slated to be deprecated end of this year (2026)
Notes
The quality tests passed, which is to be expected as no behavior should have changed.
Succesfully ran
VUNNEL_PROVIDERS_CHAINGUARD_USE_OSV=true VUNNEL_ROOT=data_osv uv run vunnel run chainguardandGRYPE_DB_PROVIDER_ROOT=../vunnel/data_osv GRYPE_DB_BUILD_DIR=./db_osv/6 go run ./cmd/grype-db build --schema 6to create grype-dbPart of Chainguard issue INT-520