Skip to content

Bug in the computeQ function - v2 classifier #41

@bharat-biradar

Description

@bharat-biradar

Describe the issue
In the computeQ when the threshold is set to 1.0 the granularity is being calculated as 10, but if we set the threshold to 0.95, 0.99, or 0.999 the granularity is being calculated as 19, 99, 999, respectively where there is exponential growth and also the granularity is greater than the granularity set at maxThresold(1.0) which is 10.

Is this intentional?

A problem occurring due to this issue is that when we set the threshold to 0.95 or greater a lot of licenses are not being detected which in the case we set to 0.9 are easily being detected.

I ran the program for around 17,300 license files out of which around 2950 BSD-3-Clause, 850 BSD-2-Clause and some other licenses were not at all detected which were otherwise detected at a granularity of 10 because at that threshold the granularity is greater than 20 and nearly reaches 100.

A possible solution would be to set the granularity to 10 for a threshold greater than 0.9 and it will also handle the divide by zero cases.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions