Skip to content

Conversation

@ajcasagrande
Copy link
Contributor

@ajcasagrande ajcasagrande commented Oct 28, 2025

Support for full custom templates for payload formatting, based on Jinja2. Automatic response parsing detection, or custom parsing logic using JSON query language syntax from JMESPath

Based on GenAi-Perf Customizable Payloads, but has full multi-modal and multi-turn support.

Demo:

Screencast.From.2025-10-27.22-35-23.mp4

Summary by CodeRabbit

  • New Features

    • Added Template Endpoint for benchmarking custom APIs with flexible Jinja2 request templates and response extraction.
    • Supports multimodal and multi-turn template configurations.
  • Documentation

    • Added comprehensive tutorial with examples and troubleshooting guidance for custom API benchmarking.
  • Chores

    • Added required dependencies for template processing functionality.

@github-actions
Copy link

Try out this PR

Quick install:

pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@ajc/template

Recommended with virtual environment (using uv):

uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@ajc/template

@github-actions github-actions bot added the feat label Oct 28, 2025
@ajcasagrande ajcasagrande self-assigned this Oct 28, 2025
@codecov
Copy link

codecov bot commented Oct 28, 2025

Codecov Report

❌ Patch coverage is 70.98765% with 47 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/aiperf/endpoints/base_endpoint.py 57.53% 24 Missing and 7 partials ⚠️
src/aiperf/endpoints/template_endpoint.py 81.60% 12 Missing and 4 partials ⚠️

📢 Thoughts on this report? Let us know!

@coderabbitai
Copy link

coderabbitai bot commented Oct 28, 2025

Walkthrough

This PR introduces a Template Endpoint feature for AIPerf, enabling custom API benchmarking via Jinja2 request templates. It adds two new dependencies, a new endpoint type, automatic response extraction utilities in BaseEndpoint, comprehensive documentation, unit tests, integration tests, and a mock server endpoint.

Changes

Cohort / File(s) Summary
Documentation & Configuration
README.md, docs/genai-perf-feature-comparison.md, docs/tutorials/template-endpoint.md, mkdocs.yml, pyproject.toml
Added Template Endpoint feature documentation to README and feature matrix; created comprehensive tutorial covering usage, configuration, templating, and response extraction; updated mkdocs navigation; added jinja2~=3.1.0 and jmespath~=1.0.1 dependencies
Enums & Public API
src/aiperf/common/enums/plugin_enums.py, src/aiperf/endpoints/__init__.py
Added TEMPLATE member to EndpointType enum; imported and exposed TemplateEndpoint in public API
Base Endpoint Utilities
src/aiperf/endpoints/base_endpoint.py
Added six extraction methods (auto_detect_and_extract, try_extract_embeddings, try_extract_rankings, try_extract_text, convert_to_response_data, extract_named_contents); updated get_endpoint_headers and get_endpoint_params initialization logic; expanded type imports
Template Endpoint Implementation
src/aiperf/endpoints/template_endpoint.py
New class supporting Jinja2 templating, named template registry, JMESPath response extraction, multimodal content handling, automatic type detection parsing, and metadata declaration
Mock Server & Test Infrastructure
tests/aiperf_mock_server/app.py, tests/endpoints/conftest.py
Added /v1/custom-multimodal endpoint to mock server; introduced create_mock_response helper function in conftest
Unit Tests
tests/common/enums/test_endpoints_enums.py, tests/endpoints/test_chat_endpoint_parse_response.py, tests/endpoints/test_solido_rag.py, tests/endpoints/test_template_endpoint.py
Removed endpoint_path validation checks; refactored test files to use shared create_mock_response fixture; added comprehensive new test module for TemplateEndpoint covering payload formatting, template variables, named templates, error cases, response parsing, and metadata
Integration Tests
tests/integration/test_custom_multimodal_template.py
New integration test class with two test cases: simple template execution and complex multimodal template with images and audio

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Template rendering and Jinja2 integration logic in TemplateEndpoint requires careful review for correctness and edge cases
  • JMESPath extraction and automatic type detection fallback chain in template_endpoint.py needs verification
  • Six new extraction utility methods in base_endpoint.py should be validated for robustness across response formats
  • Mock server endpoint contains noted duplicate function declaration that may indicate merge issues
  • Comprehensive test coverage provides good validation surface but increases review scope

Poem

🐰 A template arrives, so flexible and fine,
With Jinja2 weaving and JMESPath design,
Custom APIs now benchmark with grace,
Extraction utilities quicken the pace,
Tests and docs lead us into the space! ✨

Pre-merge checks

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "feat: fully custom template support for endpoint payloads" directly and clearly summarizes the primary change in this changeset. The main contribution is the introduction of a TemplateEndpoint class that enables custom payload formatting using Jinja2 templates, which the title accurately captures. The title is concise, specific, and uses conventional commit formatting without unnecessary noise. A teammate scanning the git history would immediately understand that this PR adds template-based customization capabilities for API endpoint payloads.
Docstring Coverage ✅ Passed Docstring coverage is 97.87% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (13)
tests/endpoints/conftest.py (1)

70-80: LGTM! Excellent test utility refactoring.

Centralizing mock response creation in a shared helper reduces duplication and ensures consistency across endpoint tests. The use of spec=InferenceServerResponse provides type safety.

docs/genai-perf-feature-comparison.md (1)

51-51: Clarify “multi‑turn” note to avoid contradiction with session disclaimer.

Row note suggests full multi‑turn support; section at Lines 130‑134 says multi‑turn sessions aren’t supported. Reword note to “multi‑turn variables inside templates,” not session benchmarking.

-| **template** | Template-based inference endpoints | ✅ | ✅ | AIPerf supports multimodal and multi-turn templates |
+| **template** | Template-based inference endpoints | ✅ | ✅ | AIPerf supports multimodal templates and multi‑turn variables inside templates (not multi‑turn session benchmarking). |

Also applies to: 130-134

docs/tutorials/template-endpoint.md (2)

147-154: Use |tojson for booleans in JSON templates.

stream|lower works but |tojson is safer and consistent for JSON emission.

   "prompt": {{ text|tojson }},
   "max_new_tokens": {{ max_tokens|tojson }},
-  "stream": {{ stream|lower }}
+  "stream": {{ stream|tojson }}

196-205: Fix headings per markdownlint (MD036).

Change emphasized lines to proper headings.

-**Template didn't render valid JSON**
+### Template didn't render valid JSON
-**Response not parsed correctly**
+### Response not parsed correctly
-**Variables not available**
+### Variables not available
tests/endpoints/test_template_endpoint.py (3)

80-85: Prefer splat over list concatenation (RUF005).

-    extra=[("payload_template", template)] + extra_vars,
+    extra=[("payload_template", template), *extra_vars],

139-146: Rename test to reflect behavior (inline string, not unknown name).

The test uses an inline template, not an unknown named template.

-def test_named_template_not_found_uses_as_inline(self):
-    """Test that unknown named template is treated as inline template."""
+def test_inline_template_string_renders(self):
+    """Test that an inline template string renders correctly."""

331-348: Add a JMESPath extraction test (coverage for response_field).

Consider adding a test that sets response_field:'data[0].vector' and asserts embeddings are extracted.

I can open a follow‑up PR adding a parametrized test for response_field with embeddings/rankings/text.

tests/integration/test_custom_multimodal_template.py (2)

34-36: Write files with explicit UTF‑8 encoding for portability.

-template_file.write_text(template)
+template_file.write_text(template, encoding="utf-8")

Also applies to: 74-76


50-51: Assert outputs in the simple test as well for symmetry.

-assert result.request_count == defaults.request_count
+assert result.request_count == defaults.request_count
+assert result.has_all_outputs
src/aiperf/endpoints/template_endpoint.py (4)

61-62: Use StrictUndefined to fail fast on missing template vars.

Prevents silent None/empty strings when a template references a non-existent variable.

-self._template = jinja2.Environment().from_string(template_source)
+self._template = jinja2.Environment(
+    undefined=jinja2.StrictUndefined,
+    trim_blocks=True,
+    lstrip_blocks=True,
+).from_string(template_source)

170-176: Don’t treat empty dicts as “no JSON”.

Use explicit None check so {} still goes through auto‑detection.

-json_obj = response.get_json()
-if not json_obj:
+json_obj = response.get_json()
+if json_obj is None:

179-185: Preserve falsy but valid JMESPath results.

Use is not None to allow values like 0, false, or [] to be considered.

-        if self._compiled_jmespath:
+        if self._compiled_jmespath:
             try:
-                if value := self._compiled_jmespath.search(json_obj):
+                value = self._compiled_jmespath.search(json_obj)
+                if value is not None:
                     response_data = self.convert_to_response_data(value)
             except (jmespath.exceptions.JMESPathError, TypeError) as e:
                 self.warning(f"JMESPath search failed: {e!r}. Trying auto-detection.")

153-155: Confirm merge intent for extra fields overwriting template keys.

Current payload.update(self._extra_fields) lets --extra-inputs override any rendered field (e.g., text). If that’s unintended, use a non‑destructive merge.

-if self._extra_fields:
-    payload.update(self._extra_fields)
+if self._extra_fields:
+    for k, v in self._extra_fields.items():
+        payload.setdefault(k, v)  # keep rendered values unless missing
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 141c244 and 4e8fb9b.

📒 Files selected for processing (16)
  • README.md (1 hunks)
  • docs/genai-perf-feature-comparison.md (1 hunks)
  • docs/tutorials/template-endpoint.md (1 hunks)
  • mkdocs.yml (1 hunks)
  • pyproject.toml (1 hunks)
  • src/aiperf/common/enums/plugin_enums.py (1 hunks)
  • src/aiperf/endpoints/__init__.py (2 hunks)
  • src/aiperf/endpoints/base_endpoint.py (3 hunks)
  • src/aiperf/endpoints/template_endpoint.py (1 hunks)
  • tests/aiperf_mock_server/app.py (1 hunks)
  • tests/common/enums/test_endpoints_enums.py (0 hunks)
  • tests/endpoints/conftest.py (3 hunks)
  • tests/endpoints/test_chat_endpoint_parse_response.py (1 hunks)
  • tests/endpoints/test_solido_rag.py (1 hunks)
  • tests/endpoints/test_template_endpoint.py (1 hunks)
  • tests/integration/test_custom_multimodal_template.py (1 hunks)
💤 Files with no reviewable changes (1)
  • tests/common/enums/test_endpoints_enums.py
🧰 Additional context used
🧬 Code graph analysis (9)
tests/integration/test_custom_multimodal_template.py (2)
tests/integration/conftest.py (5)
  • AIPerfCLI (47-110)
  • IntegrationTestDefaults (30-44)
  • cli (267-272)
  • aiperf_mock_server (151-210)
  • run (56-82)
tests/integration/models.py (3)
  • AIPerfMockServer (30-41)
  • request_count (161-165)
  • has_all_outputs (125-134)
src/aiperf/endpoints/__init__.py (1)
src/aiperf/endpoints/template_endpoint.py (1)
  • TemplateEndpoint (30-193)
tests/endpoints/test_solido_rag.py (1)
tests/endpoints/conftest.py (1)
  • create_mock_response (70-80)
tests/endpoints/conftest.py (1)
src/aiperf/common/protocols.py (3)
  • InferenceServerResponse (365-406)
  • get_json (398-406)
  • get_text (390-396)
tests/endpoints/test_template_endpoint.py (6)
src/aiperf/common/enums/plugin_enums.py (1)
  • EndpointType (19-33)
src/aiperf/common/exceptions.py (1)
  • InvalidStateError (130-131)
src/aiperf/common/models/record_models.py (2)
  • RequestInfo (699-752)
  • TextResponseData (554-561)
src/aiperf/endpoints/template_endpoint.py (4)
  • TemplateEndpoint (30-193)
  • format_payload (95-157)
  • parse_response (159-193)
  • metadata (82-93)
tests/endpoints/conftest.py (3)
  • create_endpoint_with_mock_transport (44-50)
  • create_mock_response (70-80)
  • create_model_endpoint (22-41)
src/aiperf/common/models/model_endpoint_info.py (1)
  • primary_model_name (148-150)
tests/aiperf_mock_server/app.py (3)
tests/aiperf_mock_server/utils.py (3)
  • with_error_injection (39-51)
  • RequestContext (84-95)
  • wait_until_completion (93-95)
tests/aiperf_mock_server/models.py (2)
  • ChatCompletionRequest (47-57)
  • total_tokens (108-110)
tests/aiperf_mock_server/tokens.py (1)
  • create_usage (45-56)
src/aiperf/endpoints/template_endpoint.py (6)
src/aiperf/common/enums/plugin_enums.py (1)
  • EndpointType (19-33)
src/aiperf/common/exceptions.py (1)
  • InvalidStateError (130-131)
src/aiperf/common/factories.py (1)
  • EndpointFactory (474-492)
src/aiperf/common/models/record_models.py (2)
  • ParsedResponse (599-614)
  • RequestInfo (699-752)
src/aiperf/endpoints/base_endpoint.py (8)
  • metadata (43-44)
  • BaseEndpoint (31-257)
  • format_payload (60-64)
  • extract_named_contents (235-257)
  • parse_response (67-70)
  • make_text_response_data (88-90)
  • convert_to_response_data (204-233)
  • auto_detect_and_extract (92-113)
src/aiperf/common/protocols.py (2)
  • EndpointProtocol (350-361)
  • InferenceServerResponse (365-406)
src/aiperf/endpoints/base_endpoint.py (3)
src/aiperf/common/models/record_models.py (8)
  • BaseResponseData (546-551)
  • EmbeddingResponseData (583-588)
  • ParsedResponse (599-614)
  • RankingsResponseData (591-596)
  • RequestInfo (699-752)
  • get (156-161)
  • get (173-175)
  • TextResponseData (554-561)
src/aiperf/common/models/metadata.py (1)
  • EndpointMetadata (11-41)
src/aiperf/common/models/model_endpoint_info.py (1)
  • ModelEndpointInfo (117-150)
tests/endpoints/test_chat_endpoint_parse_response.py (1)
tests/endpoints/conftest.py (1)
  • create_mock_response (70-80)
🪛 markdownlint-cli2 (0.18.1)
docs/tutorials/template-endpoint.md

196-196: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


199-199: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


203-203: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

🪛 Ruff (0.14.1)
tests/endpoints/test_template_endpoint.py

82-82: Consider [("payload_template", template), *extra_vars] instead of concatenation

Replace with [("payload_template", template), *extra_vars]

(RUF005)

src/aiperf/endpoints/template_endpoint.py

45-47: Avoid specifying long messages outside the exception class

(TRY003)


61-61: By default, jinja2 sets autoescape to False. Consider using autoescape=True or the select_autoescape function to mitigate XSS vulnerabilities.

(S701)


105-105: Avoid specifying long messages outside the exception class

(TRY003)


149-151: Avoid specifying long messages outside the exception class

(TRY003)

src/aiperf/endpoints/base_endpoint.py

54-54: Unused method argument: request_info

(ARG002)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: integration-tests (ubuntu-latest, 3.10)
  • GitHub Check: build (ubuntu-latest, 3.12)
  • GitHub Check: build (macos-latest, 3.11)
  • GitHub Check: build (macos-latest, 3.10)
  • GitHub Check: build (ubuntu-latest, 3.11)
  • GitHub Check: build (ubuntu-latest, 3.10)
  • GitHub Check: integration-tests (ubuntu-latest, 3.12)
  • GitHub Check: integration-tests (ubuntu-latest, 3.11)
🔇 Additional comments (13)
tests/aiperf_mock_server/app.py (1)

276-355: LGTM! Mock endpoint implementation is appropriate.

The custom multimodal endpoint correctly extracts the multimodal bundle, creates a mock request for timing simulation, and returns a response in the expected custom format. The inclusion of both a top-level text field and nested completion.generated_text enables auto-detection testing as intended.

src/aiperf/common/enums/plugin_enums.py (1)

33-33: LGTM! Clean enum addition.

The new TEMPLATE enum value follows the existing convention and properly supports the new TemplateEndpoint feature.

mkdocs.yml (1)

20-20: LGTM! Documentation navigation updated.

The Template Endpoint is appropriately placed in the Advanced Features section alongside other similar features.

src/aiperf/endpoints/__init__.py (1)

31-33: LGTM! Clean public API addition.

The TemplateEndpoint is properly imported and exported following the established pattern for other endpoint types.

Also applies to: 45-45

tests/endpoints/test_chat_endpoint_parse_response.py (1)

12-12: LGTM! Good refactoring to use shared test utility.

Using the centralized create_mock_response helper from conftest reduces duplication and improves maintainability across the test suite.

README.md (1)

60-60: LGTM! Clear feature documentation.

The Template Endpoint feature is well-documented with a clear description and relevant use cases that align with the PR objectives.

tests/endpoints/test_solido_rag.py (1)

13-18: Good refactor to shared fixture.

Centralizing create_mock_response via tests.endpoints.conftest reduces duplication.

src/aiperf/endpoints/base_endpoint.py (6)

7-7: LGTM!

Import additions are appropriate for the new extraction utility methods.

Also applies to: 11-22


46-57: LGTM!

The changes improve flexibility by respecting user-provided headers and URL parameters from the configuration. The request_info parameter, while unused in the base implementation, is appropriate for subclass overrides.


92-113: LGTM!

The auto-detection logic follows a sensible order from most specific (embeddings) to most generic (text), providing a useful fallback for template endpoints with flexible response formats.


204-233: LGTM!

The conversion logic appropriately infers response data types from value structure, providing a clean API for template endpoints.


235-257: LGTM!

The method correctly extracts and organizes media contents, with appropriate handling for items sharing the same name.


115-202: Code is compatible; project requires Python 3.10+.

The extraction methods are well-designed and handle multiple common response formats effectively. The union type syntax int | float on line 143 is valid, as the project's pyproject.toml explicitly requires requires-python = ">=3.10". No compatibility issues.

Signed-off-by: Anthony Casagrande <[email protected]>
def auto_detect_and_extract(self, json_obj: dict) -> BaseResponseData | None:
"""Optional utility: Auto-detect response type and extract relevant data.
Tries to extract data in this order: embeddings, rankings, text.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, why this order?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants