-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Port alternation switch optimization from source generator to RegexCompiler #122959
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…mpiler Add switch-based optimization for alternations in RegexCompiler that matches the source generator's behavior. The optimization applies when: 1. The alternation is atomic or no branch can backtrack 2. Not right-to-left matching 3. Every branch begins with unique character(s) 4. Roslyn heuristic is satisfied: count >= 7 AND density >= 0.5 This uses an IL switch instruction for efficient branch selection based on the first character of each alternation branch. Co-authored-by: stephentoub <[email protected]>
Co-authored-by: stephentoub <[email protected]>
|
Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions |
...libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCompiler.cs
Outdated
Show resolved
Hide resolved
...libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCompiler.cs
Show resolved
Hide resolved
...libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCompiler.cs
Outdated
Show resolved
Hide resolved
...libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCompiler.cs
Outdated
Show resolved
Hide resolved
...libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCompiler.cs
Outdated
Show resolved
Hide resolved
...libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCompiler.cs
Outdated
Show resolved
Hide resolved
...libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCompiler.cs
Outdated
Show resolved
Hide resolved
...libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCompiler.cs
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR ports the switch-based alternation optimization from RegexGenerator.Emitter.cs to RegexCompiler.cs. The source generator has been emitting C# switch statements for alternations where every branch begins with unique characters, relying on Roslyn to lower it to an IL switch when beneficial. This change adds the same optimization directly to the compiler, using Roslyn's heuristic: emit an IL switch if count >= 7 and density >= 0.5 (where density = count / range).
Key changes:
- Adds
TryEmitAlternationAsSwitchmethod to check eligibility and apply the Roslyn heuristic - Adds
EmitSwitchedBranchesmethod to emit the IL switch instruction with proper bounds checking - Handles Multi/Set/Concatenate nodes by correctly slicing off the first matched character
- Refactor TryEmitAlternationAsSwitch to return false instead of using canUseSwitchedBranches local - Port TryEmitAlternationAsSwitch refactoring to source generator for consistency - Simplify switch table building with ternary operator - Change threshold from >= 7 to >= 3 to match actual Roslyn behavior - Remove explicit bounds check, let switch fall through to originalDoneLabel - Remove noMatchLabel, jump directly to originalDoneLabel - Move doneLabel = originalDoneLabel to after emitting the child (matching source generator) - Add test cases for alternation switch optimization Co-authored-by: stephentoub <[email protected]>
Co-authored-by: stephentoub <[email protected]>
Description
Ports the switch-based alternation optimization from
RegexGenerator.Emitter.cstoRegexCompiler.cs. The source generator emits a C# switch statement for alternations where every branch begins with unique characters, relying on Roslyn to lower it to an IL switch when beneficial. This change adds the same optimization directly to the compiler using the Roslyn heuristic:count >= 3ANDdensity >= 0.5(where density = count / range)Implementation:
TryEmitAlternationAsSwitch: Checks eligibility (atomic or no backtracking branches, unique starting chars, RightToLeft disabled) and applies Roslyn heuristicEmitSwitchedBranches: Emits IL switch instruction, handles Multi/Set/Concatenate nodes by slicing off the first matched characterThe optimization provides O(1) branch selection instead of sequential checking when the heuristic is satisfied.
Synchronization with Source Generator:
TryEmitAlternationAsSwitchrefactoring back to the source generator to keep both implementations synchronizedCustomer Impact
Performance improvement for compiled regexes with alternations meeting the criteria. No functional change.
Regression
No, this is a new optimization bringing parity with the source generator.
Testing
All 30,496 functional tests pass. Added new test cases for alternation switch optimization covering 8-branch atomic alternations with unique starting characters, testing match, no-match, and partial input scenarios.
Risk
Low. The optimization only triggers under strict conditions matching the source generator's behavior, and falls back to existing code path otherwise.
Package authoring no longer needed in .NET 9
IMPORTANT: Starting with .NET 9, you no longer need to edit a NuGet package's csproj to enable building and bump the version.
Keep in mind that we still need package authoring in .NET 8 and older versions.
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.