Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Jan 7, 2026

Description

Ports the switch-based alternation optimization from RegexGenerator.Emitter.cs to RegexCompiler.cs. The source generator emits a C# switch statement for alternations where every branch begins with unique characters, relying on Roslyn to lower it to an IL switch when beneficial. This change adds the same optimization directly to the compiler using the Roslyn heuristic:

  • count >= 3 AND density >= 0.5 (where density = count / range)

Implementation:

  • TryEmitAlternationAsSwitch: Checks eligibility (atomic or no backtracking branches, unique starting chars, RightToLeft disabled) and applies Roslyn heuristic
  • EmitSwitchedBranches: Emits IL switch instruction, handles Multi/Set/Concatenate nodes by slicing off the first matched character

The optimization provides O(1) branch selection instead of sequential checking when the heuristic is satisfied.

Synchronization with Source Generator:

  • Ported the TryEmitAlternationAsSwitch refactoring back to the source generator to keep both implementations synchronized
  • Both implementations now use the same structure with early returns instead of local boolean flags

Customer Impact

Performance improvement for compiled regexes with alternations meeting the criteria. No functional change.

Regression

No, this is a new optimization bringing parity with the source generator.

Testing

All 30,496 functional tests pass. Added new test cases for alternation switch optimization covering 8-branch atomic alternations with unique starting characters, testing match, no-match, and partial input scenarios.

Risk

Low. The optimization only triggers under strict conditions matching the source generator's behavior, and falls back to existing code path otherwise.

Package authoring no longer needed in .NET 9

IMPORTANT: Starting with .NET 9, you no longer need to edit a NuGet package's csproj to enable building and bump the version.
Keep in mind that we still need package authoring in .NET 8 and older versions.

Original prompt

The regex source generator and the regex compiler are mostly in sync, but there are a few places where they've diverged. One in particular is

// Note: This optimization does not exist with RegexOptions.Compiled. Here we rely on the
// C# compiler to lower the C# switch statement with appropriate optimizations. In some
// cases there are enough branches that the compiler will emit a jump table. In others
// it'll optimize the order of checks in order to minimize the total number in the worst
// case. In any case, we get easier to read and reason about C#.
EmitSwitchedBranches();
, where the source generator has a special optimization for alternations where every branch provably begins with a different character, in which case it can emit a C# switch statement. That doesn't exist in the RegexCompiler ( ) because the optimization relies on the C# compiler's lowering of a C# switch to an IL switch but only when it's a perf win. Please port this optimization to the regex compiler, using the same heuristic the C# compiler (Roslyn) uses, which is if the count_of_values / max_value-min_value >= .5 and if count_of_values >= 7, it'll emit a switch... in this case, use the same heuristic to determine whether to do the optimization at all. No additional tests are needed, but all existing functional tests must pass.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 2 commits January 7, 2026 03:59
…mpiler

Add switch-based optimization for alternations in RegexCompiler that matches
the source generator's behavior. The optimization applies when:
1. The alternation is atomic or no branch can backtrack
2. Not right-to-left matching
3. Every branch begins with unique character(s)
4. Roslyn heuristic is satisfied: count >= 7 AND density >= 0.5

This uses an IL switch instruction for efficient branch selection based on
the first character of each alternation branch.

Co-authored-by: stephentoub <[email protected]>
Copilot AI changed the title [WIP] Port optimization for regex alternations to regex compiler Port alternation switch optimization from source generator to RegexCompiler Jan 7, 2026
Copilot AI requested a review from stephentoub January 7, 2026 04:09
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

@stephentoub stephentoub marked this pull request as ready for review January 8, 2026 05:05
Copilot AI review requested due to automatic review settings January 8, 2026 05:05
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR ports the switch-based alternation optimization from RegexGenerator.Emitter.cs to RegexCompiler.cs. The source generator has been emitting C# switch statements for alternations where every branch begins with unique characters, relying on Roslyn to lower it to an IL switch when beneficial. This change adds the same optimization directly to the compiler, using Roslyn's heuristic: emit an IL switch if count >= 7 and density >= 0.5 (where density = count / range).

Key changes:

  • Adds TryEmitAlternationAsSwitch method to check eligibility and apply the Roslyn heuristic
  • Adds EmitSwitchedBranches method to emit the IL switch instruction with proper bounds checking
  • Handles Multi/Set/Concatenate nodes by correctly slicing off the first matched character

Copilot AI and others added 2 commits January 8, 2026 05:44
- Refactor TryEmitAlternationAsSwitch to return false instead of using canUseSwitchedBranches local
- Port TryEmitAlternationAsSwitch refactoring to source generator for consistency
- Simplify switch table building with ternary operator
- Change threshold from >= 7 to >= 3 to match actual Roslyn behavior
- Remove explicit bounds check, let switch fall through to originalDoneLabel
- Remove noMatchLabel, jump directly to originalDoneLabel
- Move doneLabel = originalDoneLabel to after emitting the child (matching source generator)
- Add test cases for alternation switch optimization

Co-authored-by: stephentoub <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants