Skip to content

Conversation

@ppkarwasz
Copy link
Contributor

Since in practice most Package URLs don't have any percent encoded characters, this PR optimizes the percentEncode/percentDecode methods by:

  • Always testing if encoding/decoding is needed. If it is not needed, the methods return the argument unmodified.
  • Improving the performance of isUnreserved, since profiling shows that this is the most important method for the performance of percentEncode.

Fixes a bug in the benchmark initialization and adds a `toLowerCase` benchmark.
The benchmark **must** be initialized in a `@Setup` method, otherwise `nonAsciiProb` will always be `0.0`.
Since strings that don't require **any** percent encoding are in practice the rule, the encoding/decoding code should be optimized for this case.
@ppkarwasz ppkarwasz changed the title Improve encoding/decoding performance for ASCII strings feat: Improve encoding/decoding performance for ASCII strings Mar 23, 2025
@ppkarwasz
Copy link
Contributor Author

Benchmark before

Benchmark (nonAsciiProb) Mode Cnt Score Error Units
StringUtilBenchmark.baseline 0 avgt 5 35.683 ± 0.134 us/op
StringUtilBenchmark.baseline 0.1 avgt 5 658.929 ± 3.496 us/op
StringUtilBenchmark.baseline 0.5 avgt 5 1965.363 ± 47.918 us/op
StringUtilBenchmark.percentDecode 0 avgt 5 142.311 ± 37.302 us/op
StringUtilBenchmark.percentDecode 0.1 avgt 5 930.824 ± 23.453 us/op
StringUtilBenchmark.percentDecode 0.5 avgt 5 2614.854 ± 147.601 us/op
StringUtilBenchmark.percentEncode 0 avgt 5 870.956 ± 29.092 us/op
StringUtilBenchmark.percentEncode 0.1 avgt 5 1844.124 ± 62.144 us/op
StringUtilBenchmark.percentEncode 0.5 avgt 5 3684.842 ± 61.788 us/op
StringUtilBenchmark.toLowerCase 0 avgt 5 103.010 ± 3.429 us/op
StringUtilBenchmark.toLowerCase 0.1 avgt 5 119.274 ± 1.037 us/op
StringUtilBenchmark.toLowerCase 0.5 avgt 5 111.013 ± 3.908 us/op
StringUtilBenchmark.toLowerCaseJre 0 avgt 5 729.618 ± 9.626 us/op
StringUtilBenchmark.toLowerCaseJre 0.1 avgt 5 981.569 ± 44.689 us/op
StringUtilBenchmark.toLowerCaseJre 0.5 avgt 5 1472.047 ± 32.888 us/op

Benchmark after

Benchmark (nonAsciiProb) Mode Cnt Score Error Units
StringUtilBenchmark.baseline 0 avgt 5 35.997 ± 0.989 us/op
StringUtilBenchmark.baseline 0.1 avgt 5 649.834 ± 13.729 us/op
StringUtilBenchmark.baseline 0.5 avgt 5 2112.914 ± 37.607 us/op
StringUtilBenchmark.percentDecode 0 avgt 5 4.110 ± 0.091 us/op
StringUtilBenchmark.percentDecode 0.1 avgt 5 875.127 ± 30.614 us/op
StringUtilBenchmark.percentDecode 0.5 avgt 5 2255.278 ± 32.528 us/op
StringUtilBenchmark.percentEncode 0 avgt 5 80.203 ± 1.481 us/op
StringUtilBenchmark.percentEncode 0.1 avgt 5 939.128 ± 45.859 us/op
StringUtilBenchmark.percentEncode 0.5 avgt 5 3351.013 ± 109.419 us/op
StringUtilBenchmark.toLowerCase 0 avgt 5 104.420 ± 3.729 us/op
StringUtilBenchmark.toLowerCase 0.1 avgt 5 119.050 ± 1.828 us/op
StringUtilBenchmark.toLowerCase 0.5 avgt 5 110.732 ± 0.130 us/op
StringUtilBenchmark.toLowerCaseJre 0 avgt 5 717.085 ± 45.253 us/op
StringUtilBenchmark.toLowerCaseJre 0.1 avgt 5 963.733 ± 9.452 us/op
StringUtilBenchmark.toLowerCaseJre 0.5 avgt 5 1436.714 ± 42.871 us/op

Copy link
Collaborator

@jeremylong jeremylong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jeremylong jeremylong merged commit ae666d9 into package-url:scratch/refactor-stringutils Mar 23, 2025
3 checks passed
@ppkarwasz ppkarwasz deleted the feat/percent-performance branch March 23, 2025 21:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants