-
Couldn't load subscription status.
- Fork 4
feat: fully custom template support for endpoint payloads #406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Try out this PRQuick install: pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@ajc/templateRecommended with virtual environment (using uv): uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@ajc/template |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
WalkthroughThis PR introduces a Template Endpoint feature for AIPerf, enabling custom API benchmarking via Jinja2 request templates. It adds two new dependencies, a new endpoint type, automatic response extraction utilities in BaseEndpoint, comprehensive documentation, unit tests, integration tests, and a mock server endpoint. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Poem
Pre-merge checks✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (13)
tests/endpoints/conftest.py (1)
70-80: LGTM! Excellent test utility refactoring.Centralizing mock response creation in a shared helper reduces duplication and ensures consistency across endpoint tests. The use of
spec=InferenceServerResponseprovides type safety.docs/genai-perf-feature-comparison.md (1)
51-51: Clarify “multi‑turn” note to avoid contradiction with session disclaimer.Row note suggests full multi‑turn support; section at Lines 130‑134 says multi‑turn sessions aren’t supported. Reword note to “multi‑turn variables inside templates,” not session benchmarking.
-| **template** | Template-based inference endpoints | ✅ | ✅ | AIPerf supports multimodal and multi-turn templates | +| **template** | Template-based inference endpoints | ✅ | ✅ | AIPerf supports multimodal templates and multi‑turn variables inside templates (not multi‑turn session benchmarking). |Also applies to: 130-134
docs/tutorials/template-endpoint.md (2)
147-154: Use|tojsonfor booleans in JSON templates.
stream|lowerworks but|tojsonis safer and consistent for JSON emission."prompt": {{ text|tojson }}, "max_new_tokens": {{ max_tokens|tojson }}, - "stream": {{ stream|lower }} + "stream": {{ stream|tojson }}
196-205: Fix headings per markdownlint (MD036).Change emphasized lines to proper headings.
-**Template didn't render valid JSON** +### Template didn't render valid JSON -**Response not parsed correctly** +### Response not parsed correctly -**Variables not available** +### Variables not availabletests/endpoints/test_template_endpoint.py (3)
80-85: Prefer splat over list concatenation (RUF005).- extra=[("payload_template", template)] + extra_vars, + extra=[("payload_template", template), *extra_vars],
139-146: Rename test to reflect behavior (inline string, not unknown name).The test uses an inline template, not an unknown named template.
-def test_named_template_not_found_uses_as_inline(self): - """Test that unknown named template is treated as inline template.""" +def test_inline_template_string_renders(self): + """Test that an inline template string renders correctly."""
331-348: Add a JMESPath extraction test (coverage forresponse_field).Consider adding a test that sets
response_field:'data[0].vector'and asserts embeddings are extracted.I can open a follow‑up PR adding a parametrized test for
response_fieldwith embeddings/rankings/text.tests/integration/test_custom_multimodal_template.py (2)
34-36: Write files with explicit UTF‑8 encoding for portability.-template_file.write_text(template) +template_file.write_text(template, encoding="utf-8")Also applies to: 74-76
50-51: Assert outputs in the simple test as well for symmetry.-assert result.request_count == defaults.request_count +assert result.request_count == defaults.request_count +assert result.has_all_outputssrc/aiperf/endpoints/template_endpoint.py (4)
61-62: Use StrictUndefined to fail fast on missing template vars.Prevents silent None/empty strings when a template references a non-existent variable.
-self._template = jinja2.Environment().from_string(template_source) +self._template = jinja2.Environment( + undefined=jinja2.StrictUndefined, + trim_blocks=True, + lstrip_blocks=True, +).from_string(template_source)
170-176: Don’t treat empty dicts as “no JSON”.Use explicit None check so
{}still goes through auto‑detection.-json_obj = response.get_json() -if not json_obj: +json_obj = response.get_json() +if json_obj is None:
179-185: Preserve falsy but valid JMESPath results.Use
is not Noneto allow values like0,false, or[]to be considered.- if self._compiled_jmespath: + if self._compiled_jmespath: try: - if value := self._compiled_jmespath.search(json_obj): + value = self._compiled_jmespath.search(json_obj) + if value is not None: response_data = self.convert_to_response_data(value) except (jmespath.exceptions.JMESPathError, TypeError) as e: self.warning(f"JMESPath search failed: {e!r}. Trying auto-detection.")
153-155: Confirm merge intent forextrafields overwriting template keys.Current
payload.update(self._extra_fields)lets--extra-inputsoverride any rendered field (e.g.,text). If that’s unintended, use a non‑destructive merge.-if self._extra_fields: - payload.update(self._extra_fields) +if self._extra_fields: + for k, v in self._extra_fields.items(): + payload.setdefault(k, v) # keep rendered values unless missing
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (16)
README.md(1 hunks)docs/genai-perf-feature-comparison.md(1 hunks)docs/tutorials/template-endpoint.md(1 hunks)mkdocs.yml(1 hunks)pyproject.toml(1 hunks)src/aiperf/common/enums/plugin_enums.py(1 hunks)src/aiperf/endpoints/__init__.py(2 hunks)src/aiperf/endpoints/base_endpoint.py(3 hunks)src/aiperf/endpoints/template_endpoint.py(1 hunks)tests/aiperf_mock_server/app.py(1 hunks)tests/common/enums/test_endpoints_enums.py(0 hunks)tests/endpoints/conftest.py(3 hunks)tests/endpoints/test_chat_endpoint_parse_response.py(1 hunks)tests/endpoints/test_solido_rag.py(1 hunks)tests/endpoints/test_template_endpoint.py(1 hunks)tests/integration/test_custom_multimodal_template.py(1 hunks)
💤 Files with no reviewable changes (1)
- tests/common/enums/test_endpoints_enums.py
🧰 Additional context used
🧬 Code graph analysis (9)
tests/integration/test_custom_multimodal_template.py (2)
tests/integration/conftest.py (5)
AIPerfCLI(47-110)IntegrationTestDefaults(30-44)cli(267-272)aiperf_mock_server(151-210)run(56-82)tests/integration/models.py (3)
AIPerfMockServer(30-41)request_count(161-165)has_all_outputs(125-134)
src/aiperf/endpoints/__init__.py (1)
src/aiperf/endpoints/template_endpoint.py (1)
TemplateEndpoint(30-193)
tests/endpoints/test_solido_rag.py (1)
tests/endpoints/conftest.py (1)
create_mock_response(70-80)
tests/endpoints/conftest.py (1)
src/aiperf/common/protocols.py (3)
InferenceServerResponse(365-406)get_json(398-406)get_text(390-396)
tests/endpoints/test_template_endpoint.py (6)
src/aiperf/common/enums/plugin_enums.py (1)
EndpointType(19-33)src/aiperf/common/exceptions.py (1)
InvalidStateError(130-131)src/aiperf/common/models/record_models.py (2)
RequestInfo(699-752)TextResponseData(554-561)src/aiperf/endpoints/template_endpoint.py (4)
TemplateEndpoint(30-193)format_payload(95-157)parse_response(159-193)metadata(82-93)tests/endpoints/conftest.py (3)
create_endpoint_with_mock_transport(44-50)create_mock_response(70-80)create_model_endpoint(22-41)src/aiperf/common/models/model_endpoint_info.py (1)
primary_model_name(148-150)
tests/aiperf_mock_server/app.py (3)
tests/aiperf_mock_server/utils.py (3)
with_error_injection(39-51)RequestContext(84-95)wait_until_completion(93-95)tests/aiperf_mock_server/models.py (2)
ChatCompletionRequest(47-57)total_tokens(108-110)tests/aiperf_mock_server/tokens.py (1)
create_usage(45-56)
src/aiperf/endpoints/template_endpoint.py (6)
src/aiperf/common/enums/plugin_enums.py (1)
EndpointType(19-33)src/aiperf/common/exceptions.py (1)
InvalidStateError(130-131)src/aiperf/common/factories.py (1)
EndpointFactory(474-492)src/aiperf/common/models/record_models.py (2)
ParsedResponse(599-614)RequestInfo(699-752)src/aiperf/endpoints/base_endpoint.py (8)
metadata(43-44)BaseEndpoint(31-257)format_payload(60-64)extract_named_contents(235-257)parse_response(67-70)make_text_response_data(88-90)convert_to_response_data(204-233)auto_detect_and_extract(92-113)src/aiperf/common/protocols.py (2)
EndpointProtocol(350-361)InferenceServerResponse(365-406)
src/aiperf/endpoints/base_endpoint.py (3)
src/aiperf/common/models/record_models.py (8)
BaseResponseData(546-551)EmbeddingResponseData(583-588)ParsedResponse(599-614)RankingsResponseData(591-596)RequestInfo(699-752)get(156-161)get(173-175)TextResponseData(554-561)src/aiperf/common/models/metadata.py (1)
EndpointMetadata(11-41)src/aiperf/common/models/model_endpoint_info.py (1)
ModelEndpointInfo(117-150)
tests/endpoints/test_chat_endpoint_parse_response.py (1)
tests/endpoints/conftest.py (1)
create_mock_response(70-80)
🪛 markdownlint-cli2 (0.18.1)
docs/tutorials/template-endpoint.md
196-196: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
199-199: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
203-203: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
🪛 Ruff (0.14.1)
tests/endpoints/test_template_endpoint.py
82-82: Consider [("payload_template", template), *extra_vars] instead of concatenation
Replace with [("payload_template", template), *extra_vars]
(RUF005)
src/aiperf/endpoints/template_endpoint.py
45-47: Avoid specifying long messages outside the exception class
(TRY003)
61-61: By default, jinja2 sets autoescape to False. Consider using autoescape=True or the select_autoescape function to mitigate XSS vulnerabilities.
(S701)
105-105: Avoid specifying long messages outside the exception class
(TRY003)
149-151: Avoid specifying long messages outside the exception class
(TRY003)
src/aiperf/endpoints/base_endpoint.py
54-54: Unused method argument: request_info
(ARG002)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: integration-tests (ubuntu-latest, 3.10)
- GitHub Check: build (ubuntu-latest, 3.12)
- GitHub Check: build (macos-latest, 3.11)
- GitHub Check: build (macos-latest, 3.10)
- GitHub Check: build (ubuntu-latest, 3.11)
- GitHub Check: build (ubuntu-latest, 3.10)
- GitHub Check: integration-tests (ubuntu-latest, 3.12)
- GitHub Check: integration-tests (ubuntu-latest, 3.11)
🔇 Additional comments (13)
tests/aiperf_mock_server/app.py (1)
276-355: LGTM! Mock endpoint implementation is appropriate.The custom multimodal endpoint correctly extracts the multimodal bundle, creates a mock request for timing simulation, and returns a response in the expected custom format. The inclusion of both a top-level
textfield and nestedcompletion.generated_textenables auto-detection testing as intended.src/aiperf/common/enums/plugin_enums.py (1)
33-33: LGTM! Clean enum addition.The new
TEMPLATEenum value follows the existing convention and properly supports the new TemplateEndpoint feature.mkdocs.yml (1)
20-20: LGTM! Documentation navigation updated.The Template Endpoint is appropriately placed in the Advanced Features section alongside other similar features.
src/aiperf/endpoints/__init__.py (1)
31-33: LGTM! Clean public API addition.The TemplateEndpoint is properly imported and exported following the established pattern for other endpoint types.
Also applies to: 45-45
tests/endpoints/test_chat_endpoint_parse_response.py (1)
12-12: LGTM! Good refactoring to use shared test utility.Using the centralized
create_mock_responsehelper from conftest reduces duplication and improves maintainability across the test suite.README.md (1)
60-60: LGTM! Clear feature documentation.The Template Endpoint feature is well-documented with a clear description and relevant use cases that align with the PR objectives.
tests/endpoints/test_solido_rag.py (1)
13-18: Good refactor to shared fixture.Centralizing create_mock_response via tests.endpoints.conftest reduces duplication.
src/aiperf/endpoints/base_endpoint.py (6)
7-7: LGTM!Import additions are appropriate for the new extraction utility methods.
Also applies to: 11-22
46-57: LGTM!The changes improve flexibility by respecting user-provided headers and URL parameters from the configuration. The
request_infoparameter, while unused in the base implementation, is appropriate for subclass overrides.
92-113: LGTM!The auto-detection logic follows a sensible order from most specific (embeddings) to most generic (text), providing a useful fallback for template endpoints with flexible response formats.
204-233: LGTM!The conversion logic appropriately infers response data types from value structure, providing a clean API for template endpoints.
235-257: LGTM!The method correctly extracts and organizes media contents, with appropriate handling for items sharing the same name.
115-202: Code is compatible; project requires Python 3.10+.The extraction methods are well-designed and handle multiple common response formats effectively. The union type syntax
int | floaton line 143 is valid, as the project'spyproject.tomlexplicitly requiresrequires-python = ">=3.10". No compatibility issues.
Signed-off-by: Anthony Casagrande <[email protected]>
| def auto_detect_and_extract(self, json_obj: dict) -> BaseResponseData | None: | ||
| """Optional utility: Auto-detect response type and extract relevant data. | ||
| Tries to extract data in this order: embeddings, rankings, text. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, why this order?
Support for full custom templates for payload formatting, based on Jinja2. Automatic response parsing detection, or custom parsing logic using JSON query language syntax from JMESPath
Based on GenAi-Perf Customizable Payloads, but has full multi-modal and multi-turn support.
Demo:
Screencast.From.2025-10-27.22-35-23.mp4
Summary by CodeRabbit
New Features
Documentation
Chores