-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make grype-db download smaller by switching compression methods #367
Comments
We may be able to use Xz in a performant way instead if we use https://github.com/xi2/xz This appears to be an order of magnitude faster than https://github.com/ulikunitz/xz for decompression concerns. This would mean we'd need to shell out to compress within grype-db, which seems like an alright tradeoff (The ulikunitz repo I think yields larger than expected archives than the native xz utils). Another consideration is on the compression side: I'm seeing that golang-only implementations are not achieving the best compression ratios compared to native tooling. That implies we might want to shell out to native tooling when creating archives. |
Prototype for grype is here anchore/grype@main...fast-xz . This is down from 80 second with ulikunitz to 16 seconds. Before continuing: is this acceptable? With v6 the DB size will be much smaller than what was tested with, assuming the trend is linear, it looks like this will be ~10 seconds to decompress. What's missing is removing some of the copied |
While busy doing other things, I ran a compression benchmark against today's grype vuln database. I don't know if it's valuable data to you, but I am posting here anyway. I ran it on my ThinkPad Z13, so it's 1-2-year-old commodity hardware. Summary
Full results(csv format)
|
I would highly recommend using zstandard over xz! |
Me too! When evaluating I've been trying to minimize file size while not impacting decompression time in grype. Something that threw a wrench into this evaluation process is when to use golang implementations for these methods vs shelling out to tooling to do this. I've found when compressing with a golang implementation there tends to be less ideal compression ratios and decompression times. The lesson here learned is: compress with native tooling (for best archives), decompress with golang implementations (allowing us to keep grype as a portable static binary easily). I also found that the compression ratio is pretty sensitive to what is being compressed, so while we've been prototyping a new schema we ended up changing a lot of the details based on apparent ratios we were getting with those designs (for instance, a more normalized DB design tended to be a smaller DB file size, but not a great compression ratio when compressing for distribution... but relaxing normalization and leaning more towards a json blob store the ratio was maximized). So! Where are we at today with all of the feedback incorporated? In terms of distribution sizes:
Where Xz-9 and Zstd-22 are comparable enough to be candidates here. And timing (after trying out / swapping some decompression libs... I'll spare folks the details here):
From a timing perspective Zstd wins here. edit: --ultra impacts memory used in decompression, for archives around these sizes Overall, the final verdict is Zstd 🎉 |
Adding changelog ignore since this, though this is implemented in #437, it won't be usable until v6 is enabled as the default schema (probably in a couple months). We don't want to pick this up in the next release notes. |
What would you like to be added:
Grype should download a smaller file during it's database update, probably by using .zstd compression on the current database schema.
Why is this needed:
The Grype database has grown over the years, to the point where now the database is 184 mb as a gzipped tar. This results in load on the CDN, and poor experience for many users.
Tasks:
-e
togrype-db package
-e tar.zstd
for v4 and v5 schemasThe text was updated successfully, but these errors were encountered: