Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add script to get corpus stats #9

Merged
merged 16 commits into from
Aug 29, 2024
Merged

add script to get corpus stats #9

merged 16 commits into from
Aug 29, 2024

Conversation

khieta
Copy link
Contributor

@khieta khieta commented Aug 28, 2024

This PR adds a small Python script to get stats about the corpus. The more complete solution would be to make a Rust crate that does proper parsing/analysis of policies/schemas, but I think this script will do for now.

Analysis of the current corpus in main

Total test cases 1982
Unique policies 1048 ( 52.9% )
Unique schemas 631 ( 31.8% )
Unique entity stores 918 ( 46.3% )
Entities per entity store mean: 1 / median: 1 / p90: 2
Schema entity types mean: 1 / median: 1 / p90: 1
Schema actions mean: 1 / median: 1 / p90: 1
Trivial policies 834 ( 42.1% )
Pure RBAC (non-trivial) policies 436 ( 22.0% )
Pure ABAC (non-trivial) policies 60 ( 3.0% )

Comparison of the new vs. old corpus from #7

Original New Difference
Total test cases 6065 6969 904
Unique policies 3486 ( 57.5% ) 3918 ( 56.2% ) 432
Unique schemas 2298 ( 37.9% ) 2818 ( 40.4% ) 520
Unique entity stores 2826 ( 46.6% ) 3207 ( 46.0% ) 381
Entities per entity store mean: 1 / median: 1 / p90: 4 mean: 1 / median: 1 / p90: 7 n/a
Schema entity types mean: 1 / median: 1 / p90: 2 mean: 1 / median: 1 / p90: 4 n/a
Schema actions mean: 1 / median: 1 / p90: 2 mean: 1 / median: 1 / p90: 2 n/a
Trivial policies 2191 ( 36.1% ) 2690 ( 38.6% ) 499
Pure RBAC (non-trivial) policies 1148 ( 18.9% ) 1299 ( 18.6% ) 151
Pure ABAC (non-trivial) policies 156 ( 2.6% ) 119 ( 1.7% ) -37

@khieta
Copy link
Contributor Author

khieta commented Aug 28, 2024

Ready for review! Scripts looks like they're working as expected -- you can find the output in the Action details. I'll update the main branch corpus and submodules in a future PR.

@khieta khieta marked this pull request as ready for review August 28, 2024 20:00
scripts/get_corpus_stats.py Outdated Show resolved Hide resolved
Co-authored-by: Craig Disselkoen <[email protected]>
@khieta khieta merged commit 1340414 into main Aug 29, 2024
5 checks passed
@khieta khieta deleted the khieta/update-ci branch August 29, 2024 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants