Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Gradle] Added deep-scanning of gradle modules #1380

Merged
merged 3 commits into from
Sep 17, 2024

Conversation

malice00
Copy link
Contributor

For performance-reasons, scanning of Gradle modules was disabled for modules deeper than the first level. This does however mean that your module information might be wrong. Performance obviously depends on the number of modules lower than first level, but using the multi-threaded version of the Gradle scanner makes this negligible.

Currently the optimization is just removed, I could however also add a switch (eg with an EnvVar) to (de-)activate it, although personally I think the scan should be complete and not use any (sometimes weird) defaults.

Also, for testing, I could maybe add fineract to the repotests...

Note: this fix builds on #1379!

@prabhu
Copy link
Contributor

prabhu commented Sep 17, 2024

What is the time penalty for fineract and elasticsearch?

@prabhu
Copy link
Contributor

prabhu commented Sep 17, 2024

@heubeck @setchy how does this change affect the performance of your pipelines?

@setchy
Copy link
Member

setchy commented Sep 17, 2024

@heubeck @setchy how does this change affect the performance of your pipelines?

We don't have many/any Gradle projects so a non issue for me

@prabhu
Copy link
Contributor

prabhu commented Sep 17, 2024

@malice00 can we wrap this behind a feature flag? See safe-pip-install for example. There is a utils method available. Alternatively, we can bump the minor version and include this as a breaking change.

@aryan-rajoria
Copy link
Collaborator

Custom-diff results for this PR:
elasticdeepdiff.html.json
elasticdeepdiff.json

@malice00
Copy link
Contributor Author

@aryan-rajoria I assume build-conventions and build-tools:reaper only exist once in the project? I'll take another look at what's going on, maybe I've missed something there...
@prabhu feature flag or not? You decide...

@prabhu
Copy link
Contributor

prabhu commented Sep 17, 2024

@malice00 build-conventions etc have different group names so probably ok. We can make it the default in 10.10.x, since the trees are looking better. Thank you so much for your help!

@prabhu prabhu merged commit d06aec3 into CycloneDX:master Sep 17, 2024
21 checks passed
@heubeck
Copy link
Contributor

heubeck commented Sep 17, 2024

my testcases work fine with this change, but there are no deep gradle projects included.
in general, I always prefer completeness over performance, so infinite depth is just enough :P

@malice00 malice00 deleted the fix/deep_gradle branch September 17, 2024 20:12
@malice00
Copy link
Contributor Author

Thanks for merging, but I'm not quite happy with this yet... Good to have elasticsearch as a test project, it might help fine-tune this just a little more...

@prabhu
Copy link
Contributor

prabhu commented Sep 18, 2024

@malice00 in non-multithreaded mode, there are too many duplicate components for the elasticsearch repo. Shall we work towards making the multi-threaded mode the default?

@malice00
Copy link
Contributor Author

@prabhu The current solution has too many duplicates in either mode, but I think I found the issue. Still running tests though... Multi-threaded as default would be awesome, the performance gains are unbelievable!
I might have something finished by tomorrow, I'll open another PR when it's up!

@malice00
Copy link
Contributor Author

So after some changes and testing, I had a solution which imo was correct, but there still was a difference in the number of modules in elasticsearch. After some manual comparisons I had a hunch what might be going on, so I added some extra output in cdxgen and my suspicions where indeed correct: several modules in elasticsearch have the same group, name and version and therefore resolve to the same purl:

pkg:maven/org.elasticsearch/[email protected]?type=jar: [
  "qa",
  "modules:ingest-geoip:qa"
]
pkg:maven/org.elasticsearch.plugin/[email protected]?type=jar: [
  "x-pack:qa",
  "x-pack:plugin:async-search:qa",
  "x-pack:plugin:autoscaling:qa",
  "x-pack:plugin:ccr:qa",
  "x-pack:plugin:deprecation:qa",
  "x-pack:plugin:downsample:qa",
  "x-pack:plugin:enrich:qa",
  "x-pack:plugin:ent-search:qa",
  "x-pack:plugin:eql:qa",
  "x-pack:plugin:esql:qa",
  "x-pack:plugin:fleet:qa",
  "x-pack:plugin:graph:qa",
  "x-pack:plugin:identity-provider:qa",
  "x-pack:plugin:ilm:qa",
  "x-pack:plugin:inference:qa",
  "x-pack:plugin:ml:qa",
  "x-pack:plugin:repositories-metering-api:qa",
  "x-pack:plugin:searchable-snapshots:qa",
  "x-pack:plugin:security:qa",
  "x-pack:plugin:shutdown:qa",
  "x-pack:plugin:slm:qa",
  "x-pack:plugin:snapshot-based-recoveries:qa",
  "x-pack:plugin:snapshot-repo-test-kit:qa",
  "x-pack:plugin:sql:qa",
  "x-pack:plugin:stack:qa",
  "x-pack:plugin:text-structure:qa",
  "x-pack:plugin:transform:qa",
  "x-pack:plugin:vector-tile:qa",
  "x-pack:plugin:watcher:qa"
]
pkg:maven/org.elasticsearch.plugin/[email protected]?type=jar: [
  "x-pack:plugin:esql-core:test-fixtures",
  "x-pack:plugin:ql:test-fixtures"
]

In the previous code this was not an issue, since the module-name was used in the purl (which actually would imo generate an invalid purl, since it had a ':' in the name) and there would be no duplicates.

Now, in the above list of modules, I think on all 'qa'-modules, it doesn't make much of a difference (except maybe on the correctness of the tree), since they (or at least those I manually checked) don't actually have any dependencies. The modules 'test-fixtures' however, do have their own dependencies and those would now just get bunched up together as if it was a single module. Might make finding a (vulnerable) dependency a bit harder...

Still working on some fine-tuning, but I would like some input on how to best handle this (output a warning/add a switch to disable deep-scanning/add a switch to use the module name for duplicates/...) before committing and actually 'destroying' newer SBOMs for some projects...
@prabhu @aryan-rajoria @heubeck -- would love to hear your thoughts!

@prabhu
Copy link
Contributor

prabhu commented Sep 19, 2024

@malice00 This is good observation. Any findings related to multiple third party dependencies appearing in the sbom? For example, multiple commons-compress in non-multithreaded mode

@malice00
Copy link
Contributor Author

@prabhu I still have those, but they are in both single- and multithreaded mode, so I guess this is actually what Gradle is reporting. There's still differences in both versions, but I already explained my findings on that -- although I'm not going to check that manually again in elastic, it's just too big for that!
Maybe one of the elastic devs can shine a light on the correctness of some of those dependencies?

Here's 2 SBOMs I just generated. Singlethreaded took around 150 minutes, multithreaded only 60 seconds! It can be even faster, but that's a change that isn't stable on my side yet. Anyway, bom.json is singlethreaded, bom2.json is multithreaded and I included the output of cjd as well.

proof.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants