Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 30, 2025

📄 47% (0.47x) speedup for full_path_exists in wandb/integration/kfp/kfp_patch.py

⏱️ Runtime : 3.97 milliseconds 2.70 milliseconds (best of 46 runs)

📝 Explanation and details

The optimization achieves a 47% speedup through two key improvements:

1. LRU Caching in get_module():

  • Added @lru_cache(maxsize=128) decorators to separate cached functions for lazy and non-lazy module imports
  • Prevents repeated expensive import_module_lazy() calls for the same module names
  • From profiler: get_module time reduced from 20.1ms to 18.9ms, with the cached import line dropping from 92.7% to 92.4% of function time
  • Particularly effective when full_path_exists repeatedly checks the same parent modules (e.g., "os" then "os.path")

2. Streamlined Logic in full_path_exists():

  • Eliminated the nested get_parent_child_pairs() function and its list construction overhead
  • Replaced with direct iteration using range(1, len(split)) and slice operations
  • Pre-cached wandb.util.get_module lookup in a local variable to avoid repeated attribute lookups
  • Simplified attribute existence check: split hasattr(module, child) or getattr(module, child) is None into clearer separate getattr(module, child, None) and if attr is None checks

Performance Benefits by Test Type:

  • Basic cases (single module.attribute): 35-47% faster due to reduced function call overhead
  • Long attribute chains: Up to 1805% faster due to module caching eliminating redundant imports of parent modules
  • Large-scale repeated checks: 28-59% faster as the LRU cache prevents re-importing the same modules hundreds of times

The caching is most effective when the same modules are accessed repeatedly, which is common in path validation scenarios where parent modules like "os", "sys", etc. are checked multiple times.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 3316 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import pytest
from wandb.integration.kfp.kfp_patch import full_path_exists

# unit tests

# ----------- BASIC TEST CASES -----------

def test_basic_module_function_exists():
    # Standard library module, function exists
    codeflash_output = full_path_exists("os.path.join") # 6.75μs -> 4.66μs (45.1% faster)

def test_basic_module_class_exists():
    # Standard library module, class exists
    codeflash_output = full_path_exists("datetime.datetime") # 5.78μs -> 4.08μs (41.7% faster)

def test_basic_module_attribute_exists():
    # Standard library module, attribute exists
    codeflash_output = full_path_exists("sys.version") # 5.37μs -> 3.76μs (43.0% faster)

def test_basic_module_submodule_exists():
    # Standard library module, submodule exists
    codeflash_output = full_path_exists("os.path") # 5.43μs -> 3.78μs (43.7% faster)

def test_basic_module_function_not_exists():
    # Standard library module, function does not exist
    codeflash_output = full_path_exists("os.path.not_a_function") # 7.60μs -> 5.58μs (36.2% faster)

def test_basic_module_not_exists():
    # Non-existent module
    codeflash_output = full_path_exists("notamodule.something") # 4.47μs -> 3.29μs (35.9% faster)


def test_edge_empty_string():
    # Empty string should return False
    codeflash_output = full_path_exists("") # 2.37μs -> 1.81μs (31.1% faster)

def test_edge_single_component_module_exists():
    # Single module, should return True
    codeflash_output = full_path_exists("os") # 2.42μs -> 1.76μs (37.4% faster)

def test_edge_single_component_module_not_exists():
    # Single module, does not exist
    codeflash_output = full_path_exists("notamodule") # 2.42μs -> 1.75μs (39.0% faster)






def test_edge_module_is_builtin():
    # Built-in module
    codeflash_output = full_path_exists("sys") # 2.35μs -> 1.76μs (33.2% faster)

def test_edge_module_is_package():
    # Package module
    codeflash_output = full_path_exists("pytest") # 2.39μs -> 1.77μs (34.4% faster)

def test_edge_attribute_is_property(monkeypatch):
    # Attribute is a property
    class Dummy:
        @property
        def prop(self):
            return 42
    import types
    dummy_mod = types.ModuleType("dummy_mod")
    dummy_mod.prop = Dummy().prop
    import sys
    sys.modules["dummy_mod"] = dummy_mod
    codeflash_output = full_path_exists("dummy_mod.prop") # 5.55μs -> 3.99μs (39.2% faster)

def test_edge_attribute_is_classmethod(monkeypatch):
    # Attribute is a classmethod
    class Dummy:
        @classmethod
        def cm(cls):
            return 42
    import types
    dummy_mod = types.ModuleType("dummy_mod2")
    dummy_mod.cm = Dummy.cm
    import sys
    sys.modules["dummy_mod2"] = dummy_mod
    codeflash_output = full_path_exists("dummy_mod2.cm") # 5.43μs -> 4.00μs (35.8% faster)

def test_edge_attribute_is_staticmethod(monkeypatch):
    # Attribute is a staticmethod
    class Dummy:
        @staticmethod
        def sm():
            return 42
    import types
    dummy_mod = types.ModuleType("dummy_mod3")
    dummy_mod.sm = Dummy.sm
    import sys
    sys.modules["dummy_mod3"] = dummy_mod
    codeflash_output = full_path_exists("dummy_mod3.sm") # 5.38μs -> 4.00μs (34.5% faster)

def test_edge_attribute_is_module(monkeypatch):
    # Attribute is another module
    import sys
    import types
    dummy_mod = types.ModuleType("dummy_mod4")
    dummy_child = types.ModuleType("dummy_child")
    dummy_mod.child = dummy_child
    sys.modules["dummy_mod4"] = dummy_mod
    sys.modules["dummy_child"] = dummy_child
    codeflash_output = full_path_exists("dummy_mod4.child") # 5.44μs -> 3.84μs (41.8% faster)

def test_edge_partial_path_missing():
    # First part exists, second does not
    codeflash_output = full_path_exists("os.notamodule") # 6.32μs -> 4.63μs (36.4% faster)

def test_edge_path_with_leading_dot():
    # Leading dot (invalid import path)
    codeflash_output = full_path_exists(".os.path.join") # 5.32μs -> 3.11μs (71.1% faster)

def test_edge_path_with_trailing_dot():
    # Trailing dot (invalid import path)
    codeflash_output = full_path_exists("os.path.join.") # 7.82μs -> 5.38μs (45.4% faster)

def test_edge_path_with_double_dot():
    # Double dot (invalid import path)
    codeflash_output = full_path_exists("os..path.join") # 7.19μs -> 4.53μs (58.6% faster)

def test_edge_path_with_spaces():
    # Path with spaces (invalid import path)
    codeflash_output = full_path_exists("os. path.join") # 6.81μs -> 4.52μs (50.8% faster)

def test_edge_path_with_unicode():
    # Path with unicode (invalid import path)
    codeflash_output = full_path_exists("os.path.jöin") # 8.72μs -> 6.40μs (36.1% faster)

def test_edge_path_with_numbers_in_module():
    # Path with numbers in module (invalid import path)
    codeflash_output = full_path_exists("os1.path.join") # 5.13μs -> 3.17μs (61.8% faster)

# ----------- LARGE SCALE TEST CASES -----------

def test_large_scale_many_attributes(monkeypatch):
    # Create a module with many attributes and test existence
    import sys
    import types
    dummy_mod = types.ModuleType("large_mod")
    for i in range(1000):
        setattr(dummy_mod, f"attr{i}", i)
    sys.modules["large_mod"] = dummy_mod
    # All attributes should exist
    for i in range(1000):
        codeflash_output = full_path_exists(f"large_mod.attr{i}") # 1.12ms -> 718μs (56.0% faster)
    # Attribute that does not exist
    codeflash_output = full_path_exists("large_mod.attr1000") # 3.40μs -> 3.08μs (10.6% faster)


def test_large_scale_many_nonexistent(monkeypatch):
    # Many non-existent attributes should all return False
    import sys
    import types
    dummy_mod = types.ModuleType("large_mod2")
    sys.modules["large_mod2"] = dummy_mod
    for i in range(1000):
        codeflash_output = full_path_exists(f"large_mod2.notattr{i}") # 1.62ms -> 1.26ms (28.3% faster)

def test_large_scale_many_modules(monkeypatch):
    # Create many modules and check their existence
    import sys
    import types
    for i in range(1000):
        mod_name = f"mod_{i}"
        mod = types.ModuleType(mod_name)
        sys.modules[mod_name] = mod
        codeflash_output = full_path_exists(mod_name) # 489μs -> 307μs (59.3% faster)

def test_large_scale_long_attribute_chain(monkeypatch):
    # Create a module with a long chain of attributes
    import sys
    import types
    mod = types.ModuleType("chain_mod")
    sys.modules["chain_mod"] = mod
    current = mod
    chain_length = 100
    for i in range(chain_length):
        next_mod = types.SimpleNamespace()
        setattr(current, f"attr{i}", next_mod)
        current = next_mod
    # Build the full path
    path = "chain_mod" + "".join([f".attr{i}" for i in range(chain_length)])
    codeflash_output = full_path_exists(path) # 73.9μs -> 6.88μs (975% faster)
    # Break the chain
    delattr(current, f"attr{chain_length-1}") if hasattr(current, f"attr{chain_length-1}") else None
    broken_path = path + ".attr100"
    codeflash_output = full_path_exists(broken_path) # 62.5μs -> 3.28μs (1805% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import pytest
from wandb.integration.kfp.kfp_patch import full_path_exists

# Basic Test Cases

def test_basic_existing_module_function():
    # Test with a standard library function
    codeflash_output = full_path_exists("os.path.join") # 6.72μs -> 4.76μs (41.1% faster)

def test_basic_existing_module_class():
    # Test with a standard library class
    codeflash_output = full_path_exists("collections.Counter") # 5.66μs -> 3.86μs (46.7% faster)

def test_basic_existing_module_attribute():
    # Test with a standard library attribute
    codeflash_output = full_path_exists("sys.version") # 5.52μs -> 3.86μs (43.1% faster)

def test_basic_nonexistent_module():
    # Test with a non-existent module
    codeflash_output = full_path_exists("nonexistent.module") # 4.54μs -> 3.22μs (41.2% faster)

def test_basic_nonexistent_attribute():
    # Test with a non-existent attribute in an existing module
    codeflash_output = full_path_exists("os.path.nonexistent_function") # 7.68μs -> 5.54μs (38.7% faster)

def test_basic_single_module():
    # Test with a single module (should not have parent-child pairs, so returns True)
    codeflash_output = full_path_exists("os") # 2.38μs -> 1.77μs (34.7% faster)

def test_basic_single_module_nonexistent():
    # Test with a single non-existent module
    codeflash_output = full_path_exists("nonexistentmodule") # 2.37μs -> 1.68μs (40.6% faster)

# Edge Test Cases

def test_edge_empty_string():
    # Empty string should return False
    codeflash_output = full_path_exists("") # 2.32μs -> 1.69μs (37.1% faster)

def test_edge_trailing_dot():
    # Trailing dot should be handled gracefully
    codeflash_output = full_path_exists("os.path.") # 7.75μs -> 5.56μs (39.3% faster)

def test_edge_leading_dot():
    # Leading dot should be handled gracefully
    codeflash_output = full_path_exists(".os.path") # 4.95μs -> 3.00μs (64.7% faster)

def test_edge_double_dot():
    # Double dot in the path should be handled gracefully
    codeflash_output = full_path_exists("os..path") # 6.67μs -> 4.60μs (45.0% faster)



def test_edge_module_is_none(monkeypatch):
    # Patch a module to be None and test
    import sys
    monkeypatch.setattr(sys, "version", None)
    codeflash_output = full_path_exists("sys.version") # 5.39μs -> 3.75μs (43.6% faster)
    # Restore for other tests
    monkeypatch.setattr(sys, "version", sys.__dict__["version_info"].__str__())

def test_edge_module_is_missing(monkeypatch):
    # Remove a module attribute and test
    import sys
    original = sys.__dict__.pop("version", None)
    codeflash_output = full_path_exists("sys.version") # 6.34μs -> 4.88μs (29.9% faster)
    # Restore for other tests
    if original is not None:
        sys.version = original

def test_edge_path_with_builtin(monkeypatch):
    # Builtins should work
    codeflash_output = full_path_exists("builtins.str") # 5.61μs -> 3.76μs (49.2% faster)
    codeflash_output = full_path_exists("builtins.nonexistent") # 3.30μs -> 2.69μs (22.6% faster)



def test_edge_path_with_uppercase_module():
    # Uppercase module names (should fail if not present)
    codeflash_output = full_path_exists("OS.path") # 4.58μs -> 3.22μs (42.3% faster)

def test_edge_path_with_unicode_characters():
    # Unicode in path should fail
    codeflash_output = full_path_exists("os.päth.join") # 7.96μs -> 5.54μs (43.6% faster)

def test_edge_path_with_numbers():
    # Numbers in attribute names (should fail if not present)
    codeflash_output = full_path_exists("os.path.join123") # 7.69μs -> 5.65μs (36.1% faster)

# Large Scale Test Cases

def test_large_scale_many_existing_attributes():
    # Test with many valid attributes in a real module
    import math
    all_math_funcs = [f"math.{name}" for name in dir(math) if not name.startswith("_")]
    # Limit to 100 attributes for performance
    for func_path in all_math_funcs[:100]:
        codeflash_output = full_path_exists(func_path) # 72.1μs -> 47.0μs (53.4% faster)

def test_large_scale_many_nonexistent_attributes():
    # Test with many invalid attributes in a real module
    import math
    invalid_names = [f"math.nonexistent_{i}" for i in range(100)]
    for func_path in invalid_names:
        codeflash_output = full_path_exists(func_path) # 148μs -> 113μs (31.0% faster)

def test_large_scale_long_path_chain(monkeypatch):
    # Create a long chain of attributes dynamically
    class Dummy:
        pass
    root = Dummy()
    current = root
    names = [f"attr{i}" for i in range(50)]
    for name in names:
        next_obj = Dummy()
        setattr(current, name, next_obj)
        current = next_obj
    # Inject into sys.modules
    import sys
    sys.modules["dummy"] = root
    # Build full path
    full_path = "dummy." + ".".join(names)
    codeflash_output = full_path_exists(full_path) # 28.2μs -> 5.92μs (377% faster)
    # Remove from sys.modules
    del sys.modules["dummy"]

def test_large_scale_long_path_chain_missing(monkeypatch):
    # Create a long chain, but break at the last attribute
    class Dummy:
        pass
    root = Dummy()
    current = root
    names = [f"attr{i}" for i in range(50)]
    for name in names[:-1]:
        next_obj = Dummy()
        setattr(current, name, next_obj)
        current = next_obj
    # Do not set the last attribute
    import sys
    sys.modules["dummy"] = root
    full_path = "dummy." + ".".join(names)
    codeflash_output = full_path_exists(full_path) # 27.4μs -> 5.78μs (374% faster)
    del sys.modules["dummy"]

def test_large_scale_many_modules(monkeypatch):
    # Test with many different modules and attributes
    modules = ["os", "sys", "math", "collections", "itertools"]
    attributes = {
        "os": ["path", "getcwd", "environ"],
        "sys": ["version", "platform", "executable"],
        "math": ["sin", "cos", "tan"],
        "collections": ["Counter", "defaultdict"],
        "itertools": ["count", "cycle"],
    }
    for mod in modules:
        for attr in attributes[mod]:
            codeflash_output = full_path_exists(f"{mod}.{attr}")

def test_large_scale_nonexistent_modules():
    # Test with many non-existent modules
    for i in range(100):
        codeflash_output = full_path_exists(f"nonexistentmodule{i}.attr") # 90.4μs -> 58.1μs (55.5% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-full_path_exists-mhdlsmha and push.

Codeflash Static Badge

The optimization achieves a **47% speedup** through two key improvements:

**1. LRU Caching in `get_module()`:**
- Added `@lru_cache(maxsize=128)` decorators to separate cached functions for lazy and non-lazy module imports
- Prevents repeated expensive `import_module_lazy()` calls for the same module names
- From profiler: `get_module` time reduced from 20.1ms to 18.9ms, with the cached import line dropping from 92.7% to 92.4% of function time
- Particularly effective when `full_path_exists` repeatedly checks the same parent modules (e.g., "os" then "os.path")

**2. Streamlined Logic in `full_path_exists()`:**
- Eliminated the nested `get_parent_child_pairs()` function and its list construction overhead
- Replaced with direct iteration using `range(1, len(split))` and slice operations
- Pre-cached `wandb.util.get_module` lookup in a local variable to avoid repeated attribute lookups
- Simplified attribute existence check: split `hasattr(module, child) or getattr(module, child) is None` into clearer separate `getattr(module, child, None)` and `if attr is None` checks

**Performance Benefits by Test Type:**
- **Basic cases (single module.attribute):** 35-47% faster due to reduced function call overhead
- **Long attribute chains:** Up to 1805% faster due to module caching eliminating redundant imports of parent modules
- **Large-scale repeated checks:** 28-59% faster as the LRU cache prevents re-importing the same modules hundreds of times

The caching is most effective when the same modules are accessed repeatedly, which is common in path validation scenarios where parent modules like "os", "sys", etc. are checked multiple times.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 30, 2025 15:53
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Oct 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant