Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Nov 5, 2025

📄 16% (0.16x) speedup for GenerateContentHelper.mock_generate_content_response in litellm/google_genai/main.py

⏱️ Runtime : 454 microseconds 392 microseconds (best of 250 runs)

📝 Explanation and details

The optimization achieves a 15% speedup by extracting static dictionary data to module level and reusing it across function calls instead of reconstructing it every time.

Key optimization:

  • Moved usageMetadata dictionary to module-level constant _USAGE_METADATA that's created once at import time rather than being reconstructed on every function call
  • Eliminated repeated dictionary construction of the same static values (promptTokenCount: 10, candidatesTokenCount: 20, totalTokenCount: 30)

Why this improves performance:

  • Reduces object allocation overhead: Python no longer needs to allocate memory and construct the same 3-key dictionary 904 times per profiling run
  • Eliminates redundant dictionary literal parsing: The Python interpreter processes {"key": value, ...} syntax once at import instead of every function call
  • Leverages Python's object reference efficiency: Returning a reference to an existing dictionary is faster than creating a new one

Impact on workloads:
This optimization is particularly beneficial for:

  • High-frequency testing scenarios where the mock function is called repeatedly (as shown in the test_large_number_of_calls test achieving 16.6% improvement)
  • Batch processing workloads that generate many mock responses
  • Performance-sensitive paths where even small per-call overhead reduction matters

Test case benefits:
The optimization shows consistent 12-31% improvements across all test cases, with the largest gains (20-31%) on tests with longer strings or non-string inputs, likely because the static dictionary reuse provides proportionally more benefit when other operations (string conversion) remain constant.

The optimization preserves all functionality while eliminating unnecessary repeated work, making it a pure performance win with no behavioral changes.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 903 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 1 Passed
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from typing import Any, Dict

# imports
import pytest  # used for our unit tests
from litellm.google_genai.main import GenerateContentHelper

# unit tests

# BASIC TEST CASES

def test_default_response():
    """Test default behavior with no argument."""
    codeflash_output = GenerateContentHelper.mock_generate_content_response(); result = codeflash_output # 991ns -> 843ns (17.6% faster)
    candidate = result["candidates"][0]
    # Check usageMetadata
    usage = result["usageMetadata"]

def test_custom_response():
    """Test with a custom mock_response string."""
    custom_text = "Hello, world!"
    codeflash_output = GenerateContentHelper.mock_generate_content_response(custom_text); result = codeflash_output # 971ns -> 826ns (17.6% faster)

def test_empty_string_response():
    """Test with an empty string as mock_response."""
    codeflash_output = GenerateContentHelper.mock_generate_content_response(""); result = codeflash_output # 1.11μs -> 956ns (16.4% faster)

def test_numeric_string_response():
    """Test with a numeric string as mock_response."""
    num_str = "123456"
    codeflash_output = GenerateContentHelper.mock_generate_content_response(num_str); result = codeflash_output # 1.06μs -> 897ns (17.9% faster)

def test_special_characters_response():
    """Test with special characters in mock_response."""
    special = "!@#$%^&*()_+-=[]{}|;':,.<>/?"
    codeflash_output = GenerateContentHelper.mock_generate_content_response(special); result = codeflash_output # 1.02μs -> 843ns (20.5% faster)

# EDGE TEST CASES

def test_long_string_response():
    """Test with a very long string as mock_response."""
    long_str = "a" * 1000
    codeflash_output = GenerateContentHelper.mock_generate_content_response(long_str); result = codeflash_output # 1.06μs -> 805ns (31.4% faster)

def test_unicode_response():
    """Test with unicode characters in mock_response."""
    unicode_str = "こんにちは世界🌏"
    codeflash_output = GenerateContentHelper.mock_generate_content_response(unicode_str); result = codeflash_output # 982ns -> 873ns (12.5% faster)

def test_whitespace_response():
    """Test with whitespace as mock_response."""
    ws = "   \t\n"
    codeflash_output = GenerateContentHelper.mock_generate_content_response(ws); result = codeflash_output # 1.00μs -> 887ns (13.1% faster)

def test_none_as_argument():
    """Test with None as mock_response (should coerce to 'None' as string)."""
    codeflash_output = GenerateContentHelper.mock_generate_content_response(None); result = codeflash_output # 955ns -> 843ns (13.3% faster)

def test_non_string_argument():
    """Test with an integer as mock_response (should coerce to string)."""
    codeflash_output = GenerateContentHelper.mock_generate_content_response(1234); result = codeflash_output # 1.05μs -> 852ns (23.0% faster)

def test_bool_argument():
    """Test with a boolean as mock_response (should coerce to string)."""
    codeflash_output = GenerateContentHelper.mock_generate_content_response(True); result = codeflash_output # 983ns -> 820ns (19.9% faster)

def test_list_argument():
    """Test with a list as mock_response (should coerce to string)."""
    codeflash_output = GenerateContentHelper.mock_generate_content_response([1, 2, 3]); result = codeflash_output # 1.10μs -> 885ns (24.7% faster)

def test_dict_argument():
    """Test with a dict as mock_response (should coerce to string)."""
    codeflash_output = GenerateContentHelper.mock_generate_content_response({"a": 1}); result = codeflash_output # 1.09μs -> 866ns (25.5% faster)

def test_bytes_argument():
    """Test with bytes as mock_response (should coerce to string)."""
    codeflash_output = GenerateContentHelper.mock_generate_content_response(b"bytes"); result = codeflash_output # 1.03μs -> 841ns (22.2% faster)

# LARGE SCALE TEST CASES

def test_large_text_response():
    """Test with a very large mock_response string (near 1000 chars)."""
    large_text = "X" * 999
    codeflash_output = GenerateContentHelper.mock_generate_content_response(large_text); result = codeflash_output # 1.06μs -> 890ns (19.0% faster)

def test_large_number_of_calls():
    """Test calling the function many times to check for statelessness and performance."""
    texts = [f"response_{i}" for i in range(500)]
    for i, text in enumerate(texts):
        codeflash_output = GenerateContentHelper.mock_generate_content_response(text); result = codeflash_output # 221μs -> 189μs (16.6% faster)

def test_all_candidates_are_independent():
    """Test that modifying the returned dict does not affect subsequent calls."""
    codeflash_output = GenerateContentHelper.mock_generate_content_response("first"); first = codeflash_output # 1.11μs -> 880ns (26.1% faster)
    codeflash_output = GenerateContentHelper.mock_generate_content_response("second"); second = codeflash_output # 635ns -> 545ns (16.5% faster)
    # Mutate first
    first["text"] = "mutated"
    first["candidates"][0]["content"]["parts"][0]["text"] = "mutated"

def test_token_counts_are_consistent():
    """Test that token counts are always the same regardless of input."""
    for arg in ["short", "long" * 100, 123, None, [1, 2]]:
        codeflash_output = GenerateContentHelper.mock_generate_content_response(arg); result = codeflash_output # 3.09μs -> 2.58μs (19.5% faster)
        usage = result["usageMetadata"]

def test_structure_is_consistent():
    """Test that the output structure is always the same."""
    for arg in ["a", "", "X" * 500, 123, None, [1, 2]]:
        codeflash_output = GenerateContentHelper.mock_generate_content_response(arg); result = codeflash_output # 3.44μs -> 2.99μs (15.0% faster)
        candidate = result["candidates"][0]
        # Usage metadata
        usage = result["usageMetadata"]
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from typing import TYPE_CHECKING, Any, Dict

# imports
import pytest  # used for our unit tests
from litellm.google_genai.main import GenerateContentHelper

# unit tests

# ----- BASIC TEST CASES -----

def test_default_response_structure():
    """Test that the default response has the expected structure and values."""
    codeflash_output = GenerateContentHelper.mock_generate_content_response(); resp = codeflash_output # 965ns -> 846ns (14.1% faster)
    candidate = resp["candidates"][0]
    # Check candidate content
    content = candidate["content"]
    # Check usageMetadata
    usage = resp["usageMetadata"]

def test_custom_mock_response():
    """Test that custom mock_response is reflected everywhere in the response."""
    custom_text = "Hello, world!"
    codeflash_output = GenerateContentHelper.mock_generate_content_response(mock_response=custom_text); resp = codeflash_output # 1.29μs -> 1.12μs (14.9% faster)

def test_empty_string_mock_response():
    """Test that an empty string as mock_response is handled correctly."""
    codeflash_output = GenerateContentHelper.mock_generate_content_response(mock_response=""); resp = codeflash_output # 1.26μs -> 1.03μs (22.0% faster)

def test_numeric_mock_response():
    """Test that a numeric string as mock_response is handled correctly."""
    num_str = "123456"
    codeflash_output = GenerateContentHelper.mock_generate_content_response(mock_response=num_str); resp = codeflash_output # 1.40μs -> 1.16μs (21.0% faster)

def test_special_characters_mock_response():
    """Test that special characters in mock_response are handled correctly."""
    special = "!@#$%^&*()_+-=[]{}|;':,.<>/?"
    codeflash_output = GenerateContentHelper.mock_generate_content_response(mock_response=special); resp = codeflash_output # 1.28μs -> 1.15μs (11.2% faster)

# ----- EDGE TEST CASES -----

def test_none_mock_response():
    """Test that None as mock_response is handled (should be string 'None')."""
    codeflash_output = GenerateContentHelper.mock_generate_content_response(mock_response=None); resp = codeflash_output # 1.27μs -> 1.12μs (13.6% faster)

def test_long_string_mock_response():
    """Test that a very long string as mock_response is handled correctly."""
    long_str = "A" * 1000  # 1000 characters
    codeflash_output = GenerateContentHelper.mock_generate_content_response(mock_response=long_str); resp = codeflash_output # 1.23μs -> 1.02μs (20.6% faster)

def test_unicode_mock_response():
    """Test that Unicode characters in mock_response are handled correctly."""
    unicode_str = "你好,世界🌏"
    codeflash_output = GenerateContentHelper.mock_generate_content_response(mock_response=unicode_str); resp = codeflash_output # 1.28μs -> 1.07μs (19.0% faster)

def test_list_as_mock_response():
    """Test passing a list as mock_response (should be stringified)."""
    mock_list = [1, 2, 3]
    codeflash_output = GenerateContentHelper.mock_generate_content_response(mock_response=mock_list); resp = codeflash_output # 1.34μs -> 1.10μs (22.1% faster)

def test_dict_as_mock_response():
    """Test passing a dict as mock_response (should be stringified)."""
    mock_dict = {"a": 1}
    codeflash_output = GenerateContentHelper.mock_generate_content_response(mock_response=mock_dict); resp = codeflash_output # 1.33μs -> 1.06μs (24.9% faster)

def test_boolean_as_mock_response():
    """Test passing a boolean as mock_response (should be stringified)."""
    codeflash_output = GenerateContentHelper.mock_generate_content_response(mock_response=True); resp_true = codeflash_output # 1.19μs -> 1.09μs (9.12% faster)
    codeflash_output = GenerateContentHelper.mock_generate_content_response(mock_response=False); resp_false = codeflash_output # 719ns -> 638ns (12.7% faster)

def test_candidate_content_parts_is_list_and_has_one_element():
    """Test that candidates[0]['content']['parts'] is always a list of length 1."""
    codeflash_output = GenerateContentHelper.mock_generate_content_response(); resp = codeflash_output # 1.13μs -> 950ns (19.1% faster)
    parts = resp["candidates"][0]["content"]["parts"]

def test_usage_metadata_values_are_int():
    """Test that usageMetadata values are integers and correct."""
    codeflash_output = GenerateContentHelper.mock_generate_content_response(); resp = codeflash_output # 1.04μs -> 901ns (15.1% faster)
    usage = resp["usageMetadata"]

def test_candidates_index_is_zero():
    """Test that candidates[0]['index'] is always zero."""
    codeflash_output = GenerateContentHelper.mock_generate_content_response(); resp = codeflash_output # 1.04μs -> 911ns (13.8% faster)

def test_candidates_finish_reason_is_STOP():
    """Test that candidates[0]['finishReason'] is always 'STOP'."""
    codeflash_output = GenerateContentHelper.mock_generate_content_response(); resp = codeflash_output # 1.05μs -> 864ns (21.4% faster)

def test_candidates_safety_ratings_is_empty_list():
    """Test that candidates[0]['safetyRatings'] is always an empty list."""
    codeflash_output = GenerateContentHelper.mock_generate_content_response(); resp = codeflash_output # 1.06μs -> 818ns (30.2% faster)

# ----- LARGE SCALE TEST CASES -----

def test_many_unique_mock_responses():
    """Test scalability by calling the function with many different mock_response values."""
    for i in range(100):  # Keep under 1000 for performance
        text = f"Response {i}"
        codeflash_output = GenerateContentHelper.mock_generate_content_response(mock_response=text); resp = codeflash_output # 50.7μs -> 44.5μs (13.9% faster)

def test_large_mock_response():
    """Test with a very large mock_response string (close to 1000 characters)."""
    large_text = "X" * 999  # 999 characters
    codeflash_output = GenerateContentHelper.mock_generate_content_response(mock_response=large_text); resp = codeflash_output # 1.15μs -> 967ns (18.4% faster)

def test_performance_under_multiple_calls():
    """Test function performance and determinism under repeated calls."""
    # We expect the function to always return the same structure and values for the same input
    sample_text = "Performance test"
    results = []
    for _ in range(200):  # Keep under 1000 for performance
        codeflash_output = GenerateContentHelper.mock_generate_content_response(mock_response=sample_text); resp = codeflash_output # 106μs -> 93.3μs (14.6% faster)
        results.append(resp)
    # Check all results are identical
    for r in results:
        pass

def test_large_scale_unicode_responses():
    """Test scalability with many unique unicode mock_response values."""
    for i in range(50):  # Keep under 1000 for performance
        text = f"测试{i}🌏"
        codeflash_output = GenerateContentHelper.mock_generate_content_response(mock_response=text); resp = codeflash_output # 25.6μs -> 22.6μs (13.5% faster)

def test_large_scale_special_char_responses():
    """Test scalability with many unique special character mock_response values."""
    specials = ["!@#", "$%^", "&*()", "_+-", "=[]{}", "|;':,", ".<>/?"]
    for i, s in enumerate(specials):
        text = f"{s}{i}"
        codeflash_output = GenerateContentHelper.mock_generate_content_response(mock_response=text); resp = codeflash_output # 4.07μs -> 3.52μs (15.4% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from litellm.google_genai.main import GenerateContentHelper

def test_GenerateContentHelper_mock_generate_content_response():
    GenerateContentHelper.mock_generate_content_response(mock_response='')
🔎 Concolic Coverage Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
codeflash_concolic_9_j_v9ot/tmpcnga4ax_/test_concolic_coverage.py::test_GenerateContentHelper_mock_generate_content_response 1.26μs 1.07μs 18.1%✅

To edit these changes git checkout codeflash/optimize-GenerateContentHelper.mock_generate_content_response-mhmm1rbz and push.

Codeflash Static Badge

The optimization achieves a **15% speedup** by **extracting static dictionary data to module level** and **reusing it across function calls** instead of reconstructing it every time.

**Key optimization:**
- **Moved `usageMetadata` dictionary to module-level constant** `_USAGE_METADATA` that's created once at import time rather than being reconstructed on every function call
- **Eliminated repeated dictionary construction** of the same static values (`promptTokenCount: 10`, `candidatesTokenCount: 20`, `totalTokenCount: 30`)

**Why this improves performance:**
- **Reduces object allocation overhead**: Python no longer needs to allocate memory and construct the same 3-key dictionary 904 times per profiling run
- **Eliminates redundant dictionary literal parsing**: The Python interpreter processes `{"key": value, ...}` syntax once at import instead of every function call
- **Leverages Python's object reference efficiency**: Returning a reference to an existing dictionary is faster than creating a new one

**Impact on workloads:**
This optimization is particularly beneficial for:
- **High-frequency testing scenarios** where the mock function is called repeatedly (as shown in the `test_large_number_of_calls` test achieving 16.6% improvement)
- **Batch processing workloads** that generate many mock responses
- **Performance-sensitive paths** where even small per-call overhead reduction matters

**Test case benefits:**
The optimization shows consistent 12-31% improvements across all test cases, with the largest gains (20-31%) on tests with longer strings or non-string inputs, likely because the static dictionary reuse provides proportionally more benefit when other operations (string conversion) remain constant.

The optimization preserves all functionality while eliminating unnecessary repeated work, making it a pure performance win with no behavioral changes.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 November 5, 2025 23:10
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant