Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 21, 2025

📄 48% (0.48x) speedup for sanitize_project_name in framework/py/flwr/cli/utils.py

⏱️ Runtime : 1.58 milliseconds 1.07 milliseconds (best of 350 runs)

📝 Explanation and details

The optimized code achieves a 47% speedup through two key optimizations:

1. Replaced Set-Based Character Filtering with Regex
The original code used a generator expression with set membership checking ("".join(c for c in sanitized_name if c in allowed_chars)), which consumed 61.6% of the execution time. The optimized version replaces this with re.sub(r"[^a-z0-9-]", "", sanitized_name), which is significantly faster for character filtering operations. This single regex operation is more efficient than iterating through each character and checking set membership.

2. Optimized Leading Character Removal
The original code used a while loop that repeatedly called sanitized_name[0] and created new strings with sanitized_name[1:] for each invalid leading character (16.4% of execution time combined). The optimized version uses index-based iteration to find the first valid character position, then performs a single slice operation sanitized_name[i:]. This avoids repeated string allocations and reduces memory overhead.

Performance Benefits by Test Case:

  • Large-scale tests: The regex optimization particularly benefits inputs with many invalid characters, as seen in unicode and mixed invalid/valid character test cases
  • Leading digit/separator tests: The index-based approach shows significant improvements when stripping long sequences of invalid leading characters
  • Basic cases: Even simple inputs benefit from the reduced overhead of fewer string operations

The optimizations maintain identical behavior while leveraging more efficient Python string operations and regex engine capabilities.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 118 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 1 Passed
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import re

# imports
import pytest  # used for our unit tests
from cli.utils import sanitize_project_name

# unit tests

# ---- Basic Test Cases ----

def test_basic_alphanumeric():
    # Simple alphanumeric string should remain unchanged except for lowercasing
    codeflash_output = sanitize_project_name("Project123")
    codeflash_output = sanitize_project_name("FlowerLabs")

def test_basic_spaces_and_underscores():
    # Spaces and underscores should become dashes
    codeflash_output = sanitize_project_name("Flower Labs")
    codeflash_output = sanitize_project_name("flower_labs")
    codeflash_output = sanitize_project_name("flower labs_project")

def test_basic_dots_and_slashes():
    # Dots and slashes should become dashes
    codeflash_output = sanitize_project_name("flower.labs")
    codeflash_output = sanitize_project_name("flower/labs")
    codeflash_output = sanitize_project_name("flower.labs/project")

def test_basic_mixed_separators():
    # Mixed separators should all become dashes
    codeflash_output = sanitize_project_name("flower_labs.project/2025")

def test_basic_case_insensitive():
    # Should be lowercased
    codeflash_output = sanitize_project_name("FlOwEr_LaBs")

def test_basic_dash_preserved():
    # Dashes should be preserved
    codeflash_output = sanitize_project_name("flower-labs")
    codeflash_output = sanitize_project_name("flower--labs")

# ---- Edge Test Cases ----

def test_edge_empty_string():
    # Empty string should return empty string
    codeflash_output = sanitize_project_name("")

def test_edge_only_separators():
    # Only separators should result in empty string after stripping
    codeflash_output = sanitize_project_name("   ")
    codeflash_output = sanitize_project_name("...")
    codeflash_output = sanitize_project_name("///")
    codeflash_output = sanitize_project_name("___")
    codeflash_output = sanitize_project_name(" ._/ ")

def test_edge_leading_digits():
    # Leading digits should be stripped
    codeflash_output = sanitize_project_name("123flower")
    codeflash_output = sanitize_project_name("123_flower")
    codeflash_output = sanitize_project_name("123_flower_456")

def test_edge_leading_separators():
    # Leading separators should be stripped
    codeflash_output = sanitize_project_name("_flower")
    codeflash_output = sanitize_project_name("-flower")
    codeflash_output = sanitize_project_name(".flower")
    codeflash_output = sanitize_project_name("/flower")

def test_edge_trailing_separators():
    # Trailing separators should be preserved if valid
    codeflash_output = sanitize_project_name("flower_")
    codeflash_output = sanitize_project_name("flower-")
    codeflash_output = sanitize_project_name("flower.")
    codeflash_output = sanitize_project_name("flower/")

def test_edge_invalid_characters():
    # Invalid characters should be removed
    codeflash_output = sanitize_project_name("flower@labs!")
    codeflash_output = sanitize_project_name("flower$%^labs")
    codeflash_output = sanitize_project_name("flower#labs*")

def test_edge_all_invalid():
    # All invalid characters should return empty string
    codeflash_output = sanitize_project_name("@#$%^&*()")

def test_edge_unicode_characters():
    # Unicode letters should be stripped (not in allowed set)
    codeflash_output = sanitize_project_name("flöwer")
    codeflash_output = sanitize_project_name("fløwer")
    codeflash_output = sanitize_project_name("花")
    codeflash_output = sanitize_project_name("flower花labs")

def test_edge_multiple_consecutive_separators():
    # Multiple consecutive separators should become multiple dashes
    codeflash_output = sanitize_project_name("flower__labs")
    codeflash_output = sanitize_project_name("flower..labs")
    codeflash_output = sanitize_project_name("flower//labs")
    codeflash_output = sanitize_project_name("flower _./ labs")

def test_edge_leading_invalid_and_digits():
    # Leading invalid chars and digits should be stripped
    codeflash_output = sanitize_project_name("@123_flower")
    codeflash_output = sanitize_project_name("!_123flower")

def test_edge_dash_only():
    # Only dashes should be preserved
    codeflash_output = sanitize_project_name("-")
    codeflash_output = sanitize_project_name("--")
    codeflash_output = sanitize_project_name("---")

def test_edge_dash_and_digits():
    # Leading digits and dash should result in dash only
    codeflash_output = sanitize_project_name("123-")
    codeflash_output = sanitize_project_name("123--")

def test_edge_dash_and_invalid():
    # Leading invalid, dash, and digits should result in dash only
    codeflash_output = sanitize_project_name("@-123")
    codeflash_output = sanitize_project_name("!@#-456")

def test_edge_dash_and_letters():
    # Leading dash and letters should result in letters
    codeflash_output = sanitize_project_name("-flower")

def test_edge_max_length():
    # Very long valid project name should be preserved
    long_name = "flower" * 100
    codeflash_output = sanitize_project_name(long_name)

def test_edge_only_digits():
    # Only digits should be stripped
    codeflash_output = sanitize_project_name("1234567890")

def test_edge_only_invalid_and_digits():
    codeflash_output = sanitize_project_name("@123456")

# ---- Large Scale Test Cases ----

def test_large_many_separators():
    # Large input with many separators
    name = "flower" + ("_" * 500) + "labs"
    expected = "flower" + ("-" * 500) + "labs"
    codeflash_output = sanitize_project_name(name)

def test_large_long_mixed_string():
    # Large input with mixed valid and invalid chars
    name = "FlOwEr" + ("_" * 250) + "LaBs" + ("$" * 250) + "2025"
    expected = "flower" + ("-" * 250) + "labs2025"
    codeflash_output = sanitize_project_name(name)

def test_large_all_invalid():
    # Large input with only invalid characters
    name = "@" * 1000
    codeflash_output = sanitize_project_name(name)

def test_large_leading_digits_and_separators():
    # Large input with many leading digits and separators
    name = "9" * 500 + "_" * 250 + "flowerlabs"
    expected = "flowerlabs"
    codeflash_output = sanitize_project_name(name)

def test_large_unicode():
    # Large input with many unicode characters
    name = "flower" + "ö" * 500 + "labs"
    expected = "flowerlabs"
    codeflash_output = sanitize_project_name(name)

def test_large_random_mixed():
    # Large input with random valid and invalid characters
    name = "Proj" + ("_" * 100) + "ect" + ("@" * 100) + "Name" + (" " * 100) + "2025"
    expected = "proj" + ("-" * 100) + "ectname" + ("-" * 100) + "2025"
    codeflash_output = sanitize_project_name(name)

def test_large_all_separators():
    # Large input with only separators
    name = "." * 1000
    codeflash_output = sanitize_project_name(name)

def test_large_all_digits_and_separators():
    # Large input with digits and separators only
    name = "9" * 500 + "_" * 500
    codeflash_output = sanitize_project_name(name)

# ---- Additional determinism and mutation safety ----

def test_mutation_safety():
    # Changing allowed chars should break tests
    # This test is to ensure mutation testing would fail if allowed chars change
    codeflash_output = sanitize_project_name("flower-labs")
    codeflash_output = sanitize_project_name("flower_labs")
    codeflash_output = sanitize_project_name("flower.labs")
    codeflash_output = sanitize_project_name("flower/labs")
    codeflash_output = sanitize_project_name("flower@labs")
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import re

# imports
import pytest  # used for our unit tests
from cli.utils import sanitize_project_name

# unit tests

# ------------------- Basic Test Cases -------------------

def test_basic_alphanumeric():
    # Project name is already valid
    codeflash_output = sanitize_project_name("myproject")
    # Mixed case, should be lowercased
    codeflash_output = sanitize_project_name("MyProject")
    # Contains spaces
    codeflash_output = sanitize_project_name("my project")
    # Contains dots
    codeflash_output = sanitize_project_name("my.project")
    # Contains underscores
    codeflash_output = sanitize_project_name("my_project")
    # Contains slashes
    codeflash_output = sanitize_project_name("my/project")
    # Combination of separators
    codeflash_output = sanitize_project_name("my project.name/test")

def test_basic_mixed_separators_and_case():
    # Multiple separators and mixed case
    codeflash_output = sanitize_project_name("My_Project.Name/2025")
    # Spaces at start/end
    codeflash_output = sanitize_project_name("  myproject  ")
    # Dashes are preserved
    codeflash_output = sanitize_project_name("my-project-name")

# ------------------- Edge Test Cases -------------------

def test_edge_empty_string():
    # Empty string should return empty string
    codeflash_output = sanitize_project_name("")

def test_edge_only_separators():
    # Only separators should result in empty string
    codeflash_output = sanitize_project_name("   ")
    codeflash_output = sanitize_project_name("...")
    codeflash_output = sanitize_project_name("///")
    codeflash_output = sanitize_project_name("___")
    codeflash_output = sanitize_project_name(" ._/ ")

def test_edge_special_characters():
    # Should remove all special characters except dash
    codeflash_output = sanitize_project_name("my@project!")
    codeflash_output = sanitize_project_name("my#project$name")
    # Mix of valid and invalid
    codeflash_output = sanitize_project_name("my%project^&*()")
    # Unicode characters are removed
    codeflash_output = sanitize_project_name("prøject")
    codeflash_output = sanitize_project_name("项目")

def test_edge_starting_with_digit():
    # Should remove leading digits
    codeflash_output = sanitize_project_name("123project")
    # Digits in middle/end are preserved
    codeflash_output = sanitize_project_name("project123")
    # Digits and separators at start
    codeflash_output = sanitize_project_name("123.45_myproject")
    # All digits should result in empty string
    codeflash_output = sanitize_project_name("123456")

def test_edge_starting_with_separator_or_invalid():
    # Leading separators are removed
    codeflash_output = sanitize_project_name("  ._/myproject")
    # Leading invalid chars and digits
    codeflash_output = sanitize_project_name("!@#1myproject")

def test_edge_multiple_consecutive_separators():
    # Multiple consecutive separators should collapse to multiple dashes
    codeflash_output = sanitize_project_name("my..__//project")
    # Only separators
    codeflash_output = sanitize_project_name("...___///")

def test_edge_dash_handling():
    # Dashes are preserved, not replaced
    codeflash_output = sanitize_project_name("my--project")
    # Leading dash is removed if not allowed
    codeflash_output = sanitize_project_name("-myproject")
    # Trailing dash is preserved
    codeflash_output = sanitize_project_name("myproject-")

def test_edge_all_invalid_characters():
    # Only invalid chars
    codeflash_output = sanitize_project_name("@#$%^&*()")
    # Mix of valid and invalid
    codeflash_output = sanitize_project_name("p@r#o$j%e^c&t")

def test_edge_long_invalid_prefix():
    # Long invalid prefix should be stripped
    codeflash_output = sanitize_project_name("!!!123...myproject")

def test_edge_leading_and_trailing_spaces_and_invalids():
    # Leading/trailing spaces and invalids
    codeflash_output = sanitize_project_name("   @myproject!   ")

def test_edge_dash_and_digit_start():
    # Leading dash and digit should be removed
    codeflash_output = sanitize_project_name("-1myproject")

def test_edge_dash_only():
    # Only dash is preserved
    codeflash_output = sanitize_project_name("-")

def test_edge_dash_and_separators():
    # Dash mixed with separators
    codeflash_output = sanitize_project_name("-._/")

# ------------------- Large Scale Test Cases -------------------

def test_large_scale_long_name():
    # Very long project name (1000 chars)
    long_name = "project_" * 125  # 8 chars * 125 = 1000
    expected = "project-" * 125
    expected = expected.rstrip("-")  # Remove trailing dash if present
    codeflash_output = sanitize_project_name(long_name)

def test_large_scale_many_separators():
    # 500 separators followed by valid name
    name = "." * 500 + "myproject"
    codeflash_output = sanitize_project_name(name)
    # 500 separators interleaved with valid chars
    name = ".".join(["project"] * 501)
    expected = "-".join(["project"] * 501)
    codeflash_output = sanitize_project_name(name)

def test_large_scale_mixed_invalid_and_valid():
    # 1000 chars alternating valid/invalid
    name = "".join(["a" if i % 2 == 0 else "@" for i in range(1000)])
    expected = "a" * 500
    codeflash_output = sanitize_project_name(name)

def test_large_scale_digits_and_separators():
    # 500 digits, then valid name
    name = "9" * 500 + "myproject"
    codeflash_output = sanitize_project_name(name)
    # 500 digits interleaved with separators, then valid name
    name = "".join(["9." for _ in range(250)]) + "myproject"
    codeflash_output = sanitize_project_name(name)

def test_large_scale_unicode():
    # 1000 unicode chars, should be removed
    name = "项目" * 500
    codeflash_output = sanitize_project_name(name)

def test_large_scale_all_valid():
    # 1000 valid chars
    name = "a" * 1000
    codeflash_output = sanitize_project_name(name)

def test_large_scale_all_invalid():
    # 1000 invalid chars
    name = "@" * 1000
    codeflash_output = sanitize_project_name(name)

def test_large_scale_valid_and_separators():
    # 500 valid chars and 500 separators
    name = "a" * 500 + "." * 500
    expected = "a" * 500 + "-" * 500
    codeflash_output = sanitize_project_name(name)

def test_large_scale_dash_handling():
    # 1000 dashes
    name = "-" * 1000
    codeflash_output = sanitize_project_name(name)

def test_large_scale_starting_with_digits():
    # 500 digits, 500 valid chars
    name = "9" * 500 + "a" * 500
    expected = "a" * 500
    codeflash_output = sanitize_project_name(name)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from cli.utils import sanitize_project_name

def test_sanitize_project_name():
    sanitize_project_name('i')
🔎 Concolic Coverage Tests and Runtime

To edit these changes git checkout codeflash/optimize-sanitize_project_name-mh182q8n and push.

Codeflash

The optimized code achieves a 47% speedup through two key optimizations:

**1. Replaced Set-Based Character Filtering with Regex**
The original code used a generator expression with set membership checking (`"".join(c for c in sanitized_name if c in allowed_chars)`), which consumed 61.6% of the execution time. The optimized version replaces this with `re.sub(r"[^a-z0-9-]", "", sanitized_name)`, which is significantly faster for character filtering operations. This single regex operation is more efficient than iterating through each character and checking set membership.

**2. Optimized Leading Character Removal**
The original code used a while loop that repeatedly called `sanitized_name[0]` and created new strings with `sanitized_name[1:]` for each invalid leading character (16.4% of execution time combined). The optimized version uses index-based iteration to find the first valid character position, then performs a single slice operation `sanitized_name[i:]`. This avoids repeated string allocations and reduces memory overhead.

**Performance Benefits by Test Case:**
- **Large-scale tests**: The regex optimization particularly benefits inputs with many invalid characters, as seen in unicode and mixed invalid/valid character test cases
- **Leading digit/separator tests**: The index-based approach shows significant improvements when stripping long sequences of invalid leading characters
- **Basic cases**: Even simple inputs benefit from the reduced overhead of fewer string operations

The optimizations maintain identical behavior while leveraging more efficient Python string operations and regex engine capabilities.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 21, 2025 23:55
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant