Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make grype-db download smaller by switching compression methods #367

Open
3 tasks
willmurphyscode opened this issue Aug 21, 2024 · 3 comments
Open
3 tasks
Labels
enhancement New feature or request
Milestone

Comments

@willmurphyscode
Copy link
Contributor

What would you like to be added:

Grype should download a smaller file during it's database update, probably by using .zstd compression on the current database schema.

Why is this needed:

The Grype database has grown over the years, to the point where now the database is 184 mb as a gzipped tar. This results in load on the CDN, and poor experience for many users.

Tasks:

@willmurphyscode willmurphyscode added the enhancement New feature or request label Aug 21, 2024
@wagoodman
Copy link
Contributor

wagoodman commented Sep 17, 2024

We may be able to use Xz in a performant way instead if we use https://github.com/xi2/xz This appears to be an order of magnitude faster than https://github.com/ulikunitz/xz for decompression concerns. This would mean we'd need to shell out to compress within grype-db, which seems like an alright tradeoff (The ulikunitz repo I think yields larger than expected archives than the native xz utils).

Another consideration is on the compression side: I'm seeing that golang-only implementations are not achieving the best compression ratios compared to native tooling. That implies we might want to shell out to native tooling when creating archives.

@wagoodman wagoodman added this to the DB v6 milestone Sep 17, 2024
@wagoodman wagoodman changed the title Make grype-db download smaller by using zstd compression Make grype-db download smaller by switching compression methods Sep 17, 2024
@wagoodman
Copy link
Contributor

wagoodman commented Sep 17, 2024

Prototype for grype is here anchore/grype@main...fast-xz . This is down from 80 second with ulikunitz to 16 seconds. Before continuing: is this acceptable? With v6 the DB size will be much smaller than what was tested with, assuming the trend is linear, it looks like this will be ~10 seconds to decompress.

What's missing is removing some of the copied untar code from go-getter and leveraging the stereoscope tar utils (may require some refactoring in stereoscope).

@popey
Copy link
Contributor

popey commented Sep 18, 2024

While busy doing other things, I ran a compression benchmark against today's grype vuln database. I don't know if it's valuable data to you, but I am posting here anyway. I ran it on my ThinkPad Z13, so it's 1-2-year-old commodity hardware.

Summary

Algorithm             Time(U+S)(s)  Time(E)(M:s)  ComprRatio  SpaceSave(%)
xz                    309.47        5:06.46       14.15       92.94
gzip                  20.79         0:19.93       7.79        87.18
bzip2                 119.72        1:58.66       11.26       91.13
lzip                  265.31        4:24.80       13.19       92.42
lzma                  312.10        5:09.29       14.13       92.93
lzop                  2.68          0:02.30       4.90        79.64
zstd                  6.22          0:03.89       9.01        88.92
lzip                  261.30        4:21.17       13.17       92.42
7z                    553.75        0:55.57       13.87       92.80
zip                   19.71         0:19.96       7.78        87.17
zstd -T0 -1           8.20          0:01.46       8.18        87.79
zstd -T0 -3 (def.)    13.97         0:01.85       9.01        88.92
zstd -T0 -5           33.25         0:04.06       9.58        89.57
zstd -T0 -10          68.36         0:08.60       10.94       90.87
zstd -T0 -15          236.82        0:32.13       11.23       91.10
zstd -T0 -19          1756.01       3:57.46       13.27       92.47
zstd -T0 --ultra -22  2193.86       13:03.73      17.52       94.30

Full results

(csv format)

Algorithm,Time(U+S)(s),Time(E)(M:s),ComprRatio,SpaceSave(%),T-Start(UT),T-End(UT),S-Start(b),S-End(b),Command
xz,309.47,5:06.46,14.15,92.94,1726672883,1726673189,1445834752,102163132,tar --absolute-names --xz -cf /home/alan/Source/Misairuzame/compression-benchmark/tmp/tmparch.tar.xz /home/alan/.cache/grype/db/5/vulnerability.db
gzip,20.79,0:19.93,7.79,87.18,1726673189,1726673209,1445834752,185461612,tar --absolute-names --gzip -cf /home/alan/Source/Misairuzame/compression-benchmark/tmp/tmparch.tar.gz /home/alan/.cache/grype/db/5/vulnerability.db
bzip2,119.72,1:58.66,11.26,91.13,1726673209,1726673328,1445834752,128338698,tar --absolute-names --bzip2 -cf /home/alan/Source/Misairuzame/compression-benchmark/tmp/tmparch.tar.bz2 /home/alan/.cache/grype/db/5/vulnerability.db
lzip,265.31,4:24.80,13.19,92.42,1726673328,1726673593,1445834752,109612679,tar --absolute-names --lzip -cf /home/alan/Source/Misairuzame/compression-benchmark/tmp/tmparch.tar.lz /home/alan/.cache/grype/db/5/vulnerability.db
lzma,312.10,5:09.29,14.13,92.93,1726673593,1726673902,1445834752,102269096,tar --absolute-names --lzma -cf /home/alan/Source/Misairuzame/compression-benchmark/tmp/tmparch.tar.lzma /home/alan/.cache/grype/db/5/vulnerability.db
lzop,2.68,0:02.30,4.90,79.64,1726673902,1726673905,1445834752,294485866,tar --absolute-names --lzop -cf /home/alan/Source/Misairuzame/compression-benchmark/tmp/tmparch.tar.lzop /home/alan/.cache/grype/db/5/vulnerability.db
zstd,6.22,0:03.89,9.01,88.92,1726673905,1726673909,1445834752,160330902,tar --absolute-names --zstd -cf /home/alan/Source/Misairuzame/compression-benchmark/tmp/tmparch.tar.zst /home/alan/.cache/grype/db/5/vulnerability.db
lzip,261.30,4:21.17,13.17,92.42,1726673909,1726674170,1445834752,109719253,tar --absolute-names --lzip -cf /home/alan/Source/Misairuzame/compression-benchmark/tmp/tmparch.tar.lz /home/alan/.cache/grype/db/5/vulnerability.db
7z,553.75,0:55.57,13.87,92.80,1726674170,1726674225,1445834752,104210534,7z a -bso0 -bsp0 /home/alan/Source/Misairuzame/compression-benchmark/tmp/tmparch.7z /home/alan/.cache/grype/db/5/vulnerability.db
zip,19.71,0:19.96,7.78,87.17,1726674225,1726674245,1445834752,185629104,zip -q -r /home/alan/Source/Misairuzame/compression-benchmark/tmp/tmparch.zip /home/alan/.cache/grype/db/5/vulnerability.db
zstd -T0 -1,8.20,0:01.46,8.18,87.79,1726674245,1726674247,1445834752,176662743,tar --absolute-names -I zstd -T0 -1 -cf /home/alan/Source/Misairuzame/compression-benchmark/tmp/tmparch.tar.zst /home/alan/.cache/grype/db/5/vulnerability.db
zstd -T0 -3 (def.),13.97,0:01.85,9.01,88.92,1726674247,1726674249,1445834752,160330902,tar --absolute-names -I zstd -T0 -3 -cf /home/alan/Source/Misairuzame/compression-benchmark/tmp/tmparch.tar.zst /home/alan/.cache/grype/db/5/vulnerability.db
zstd -T0 -5,33.25,0:04.06,9.58,89.57,1726674249,1726674253,1445834752,150815387,tar --absolute-names -I zstd -T0 -5 -cf /home/alan/Source/Misairuzame/compression-benchmark/tmp/tmparch.tar.zst /home/alan/.cache/grype/db/5/vulnerability.db
zstd -T0 -10,68.36,0:08.60,10.94,90.87,1726674253,1726674262,1445834752,132079524,tar --absolute-names -I zstd -T0 -10 -cf /home/alan/Source/Misairuzame/compression-benchmark/tmp/tmparch.tar.zst /home/alan/.cache/grype/db/5/vulnerability.db
zstd -T0 -15,236.82,0:32.13,11.23,91.10,1726674262,1726674294,1445834752,128697206,tar --absolute-names -I zstd -T0 -15 -cf /home/alan/Source/Misairuzame/compression-benchmark/tmp/tmparch.tar.zst /home/alan/.cache/grype/db/5/vulnerability.db
zstd -T0 -19,1756.01,3:57.46,13.27,92.47,1726674294,1726674531,1445834752,108899112,tar --absolute-names -I zstd -T0 -19 -cf /home/alan/Source/Misairuzame/compression-benchmark/tmp/tmparch.tar.zst /home/alan/.cache/grype/db/5/vulnerability.db
zstd -T0 --ultra -22,2193.86,13:03.73,17.52,94.30,1726674531,1726675315,1445834752,82520379,tar --absolute-names -I zstd -T0 --ultra -22 -cf /home/alan/Source/Misairuzame/compression-benchmark/tmp/tmparch.tar.zst /home/alan/.cache/grype/db/5/vulnerability.db

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Ready
Development

No branches or pull requests

3 participants