This guide describes how to run, extend, and understand the Lightspeed Core Stack end-to-end tests. The suite uses Behave (BDD) with Gherkin feature files and runs against a live service (Docker Compose or Prow/OpenShift).
- Overview
- Directory Layout
- How to Run E2E Tests
- Environment Variables
- Deployment Modes: Server vs Library
- Tags and Hooks
- Configuration Files
- Feature Files and Steps
- Gherkin Keywords in Feature Files
- Writing New Scenarios
- Troubleshooting
- Framework: Behave (Python BDD).
- Scope: REST API of the Lightspeed Core Stack (query, streaming_query, models, info, health, feedback, conversations, RBAC, MCP, etc.).
- Execution: Tests run in a separate process from the app. They send HTTP requests to the service and (in server mode) optionally talk to the Llama Stack service for shield setup.
- Environments: Local (Docker Compose) or Prow/OpenShift (containers/pods). Mode is detected via
E2E_DEPLOYMENT_MODEandRUNNING_PROW.
E2E tests (Behave, feature files, steps):
tests/e2e/
├── README.md # Short pointer to this guide (docs/e2e_testing.md)
├── test_list.txt # List of feature files (run order)
├── features/
│ ├── environment.py # Hooks: before_all, before_feature, before_scenario, after_scenario, after_feature
│ ├── *.feature # Gherkin feature files
│ └── steps/ # Step definitions
│ ├── common.py # Service started, default state, host/port from env
│ ├── common_http.py # HTTP helpers (status, body, headers)
│ ├── auth.py # Authorization header steps
│ ├── llm_query_response.py # query / streaming_query steps
│ ├── feedback.py # Feedback API steps
│ ├── conversation.py # Conversations / cache steps
│ ├── health.py # Health and llama-stack disruption
│ ├── info.py, models.py # Info and models endpoints
│ ├── rbac.py # RBAC steps
│ └── ...
├── configuration/ # Lightspeed-stack configs used by E2E (local Docker)
│ ├── server-mode/ # When Llama stack runs in separate process
│ └── library-mode/ # When Llama Stack is in-process
├── configs/ # Llama Stack run configs (run-ci.yaml, etc.)
├── utils/
│ ├── utils.py # restart_container, switch_config, wait_for_container_health, etc.
│ ├── prow_utils.py # Prow/OpenShift helpers (restore_llama_stack_pod, etc.)
│ └── llama_stack_shields.py # Shield unregister/register (server mode, optional)
├── mock_mcp_server/ # Mock MCP server for MCP tests
└── rag/ # RAG test data (e.g. for FAISS)
Prow/OpenShift E2E (pipelines, manifests, configs used when RUNNING_PROW is set):
tests/e2e-prow/
└── rhoai/ # RHOAI / OpenShift E2E
├── run-tests.sh # Entry to run E2E in Prow
├── pipeline.sh # Main pipeline definition
├── pipeline-services.sh # Services pipeline
├── pipeline-vllm.sh # vLLM pipeline
├── pipeline-test-pod.sh # Test pod pipeline
├── configs/ # Lightspeed-stack configs for Prow (used by environment.py when is_prow)
│ ├── lightspeed-stack.yaml
│ ├── lightspeed-stack-auth-noop-token.yaml
│ ├── lightspeed-stack-rbac.yaml
│ ├── lightspeed-stack-auth-rh-identity.yaml
│ ├── lightspeed-stack-no-cache.yaml
│ ├── lightspeed-stack-invalid-feedback-storage.yaml
│ └── run.yaml # Llama Stack run config for Prow
├── scripts/
│ ├── e2e-ops.sh # E2E ops (e.g. disrupt/restore llama-stack) — called from prow_utils
│ ├── bootstrap.sh
│ ├── deploy-vllm.sh
│ ├── fetch-vllm-image.sh
│ ├── get-vllm-pod-info.sh
│ └── gpu-setup.sh
└── manifests/ # OpenShift/Kubernetes manifests
├── lightspeed/ # Lightspeed stack, llama-stack, mock-jwks, mcp-mock-server
├── vllm/ # vLLM runtime and inference services (CPU/GPU)
├── operators/ # Operator install (operatorgroup, operators, ds-cluster)
├── namespaces/ # NFD, nvidia-operator
└── gpu/ # NFD and cluster policy for GPU
- Local: Docker Compose stack up (e.g.
docker compose up -d). The app and Llama Stack must be reachable at the host/ports you configure (see Environment Variables). - Prow: Pipeline runs in OpenShift;
RUNNING_PROWis set and Prow-specific paths/configs are used.
From the project root:
# Run all E2E tests (excluding @skip)
uv run make test-e2e
# or
uv run make test-e2e-localBoth targets use:
uv run behave --color --format pretty --tags=-skip -D dump_errors=true @tests/e2e/test_list.txt
- Feature set: The list of feature files is in
tests/e2e/test_list.txt. Order matters for execution. - Excluding scenarios:
--tags=-skipexcludes scenarios tagged with@skip.
# Single feature file
uv run behave tests/e2e/features/query.feature --tags=-skip
# Scenarios with a given tag (e.g. Authorized)
uv run behave tests/e2e/features/query.feature --tags=Authorized
# Exclude a tag
uv run behave tests/e2e/features/health.feature --tags=-skip-in-library-mode| Variable | Default | Description |
|---|---|---|
E2E_DEPLOYMENT_MODE |
server |
server or library. Drives config paths and which scenarios run (e.g. @skip-in-library-mode). |
E2E_LSC_HOSTNAME |
localhost |
Host of the Lightspeed Core Stack API. |
E2E_LSC_PORT |
8080 |
Port of the Lightspeed Core Stack API. |
E2E_LLAMA_HOSTNAME |
localhost |
Host of the Llama Stack service (server mode). |
E2E_LLAMA_PORT |
8321 |
Port of the Llama Stack service. |
E2E_LLAMA_STACK_URL |
— | Full base URL for Llama Stack (overrides host/port if set). Used by shield helpers. |
E2E_LLAMA_STACK_API_KEY |
xyzzy |
API key for Llama Stack client (e.g. shield API). |
E2E_DEFAULT_MODEL_OVERRIDE |
— | Override default LLM model id (e.g. gpt-4o-mini). |
E2E_DEFAULT_PROVIDER_OVERRIDE |
— | Override default provider id (e.g. openai). |
FAISS_VECTOR_STORE_ID |
— | Vector store id for FAISS-related scenarios. |
RUNNING_PROW |
— | Set in Prow/OpenShift; enables Prow config paths and pod/container ops. |
OPENAI_API_KEY |
— | Required. Used by the app and Llama Stack for LLM calls (e.g. OpenAI). The E2E tests and the stack will not run correctly without it. |
For local Docker runs, defaults are usually enough. Override when the stack is on different host/ports or when using library mode. You must set OPENAI_API_KEY for the tests (and the services) to run.
- Server mode (
E2E_DEPLOYMENT_MODE=server): Lightspeed Core Stack talks to a separate Llama Stack service (e.g.llama-stackcontainer). Configs underconfiguration/server-mode/are used. Scenarios that need a dedicated Llama Stack container (e.g. "llama-stack unreachable") run; those tagged@skip-in-library-moderun as well. - Library mode (
E2E_DEPLOYMENT_MODE=library): Llama Stack runs in-process with the app. Configs underconfiguration/library-mode/are used. Scenarios tagged@skip-in-library-modeare skipped (no separate llama-stack to disrupt or query for shields).
Mode is set in before_all from E2E_DEPLOYMENT_MODE and stored as context.is_library_mode.
All tag behaviour is implemented in features/environment.py: the hooks (before_all, before_feature, before_scenario, after_scenario, after_feature) read scenario.effective_tags or feature.tags and run the corresponding setup or teardown. You can add new tags by extending these hooks (and, if the tag switches config, by adding a Lightspeed Stack config and wiring it as in Writing New Scenarios).
| Tag | Effect |
|---|---|
@skip |
Scenario is skipped (reason: "Marked with @skip"). Use for broken or WIP scenarios. |
@skip-in-library-mode |
Scenario is skipped when E2E_DEPLOYMENT_MODE=library. Used for tests that require a separate Llama Stack (e.g. connection disruption). |
@local |
Skipped unless running in "local" mode (context flag). |
@InvalidFeedbackStorageConfig |
Before scenario: switch to invalid-feedback-storage config and restart container. After: restore feature config and restart. |
@NoCacheConfig |
Before scenario: switch to no-cache config and restart. After: restore and restart. |
@disable-shields |
(If used) Before scenario: unregister shield (e.g. llama-guard) via Llama Stack API; after: re-register. Server mode only; skipped in library mode. |
@Authorized |
Feature-level: use auth-noop-token config for the whole feature; restore in after_feature. |
@RBAC |
Feature-level: use RBAC config; restore in after_feature. |
@RHIdentity |
Feature-level: use RH identity config; restore in after_feature. |
@Feedback |
Feature-level: set feedback conversation list; after_feature deletes those conversations. |
@MCP |
Feature-level: use MCP config; restore in after_feature. |
You can put several tags on one scenario. To document why a scenario is skipped, add a Gherkin comment above the tags:
# Only in server mode; llama-stack is in-process in library mode
@skip-in-library-mode
@skip
Scenario: Check if service report proper readiness when llama stack is not available- before_all: Sets
deployment_mode,is_library_mode, detects or overridesdefault_model/default_provider, setsfaiss_vector_store_id. - before_feature: Applies feature-level config and restarts container for
Authorized,RBAC,RHIdentity,Feedback,MCP. - before_scenario: Skips scenarios for
@skip,@local,@skip-in-library-mode; applies scenario config forInvalidFeedbackStorageConfig/NoCacheConfig; for@disable-shields(server mode) unregisters the shield. - after_scenario: Restores Llama Stack if it was disrupted; restores config and restarts for scenario config tags; for
@disable-shieldsre-registers the shield. - after_feature: Restores config and restarts for
Authorized,RBAC,RHIdentity,MCP; deletes feedback conversations forFeedback.
- Lightspeed-stack: Under
tests/e2e/configuration/server-mode/andlibrary-mode/. Switched viaswitch_config()and copied into the container's config path (or applied via ConfigMap in Prow). Names likelightspeed-stack.yaml,lightspeed-stack-auth-noop-token.yaml,lightspeed-stack-rbac.yaml, etc. - Llama Stack: Under
tests/e2e/configs/(e.g.run-ci.yaml). Used by the Llama Stack container; not switched by Behave step-by-step, but the stack is started with the appropriate run config.
See tests/e2e/configuration/README.md for a short description of each config.
The feature files below are run in the order given in tests/e2e/test_list.txt:
| Feature file | What it tests |
|---|---|
faiss.feature |
FAISS support: vector store registration, RAGs endpoint, file_search tool. |
smoketests.feature |
Smoke tests: main endpoint reachability. |
authorized_noop.feature |
/v1/authorized endpoint with noop auth (no token required). |
authorized_noop_token.feature |
/v1/authorized endpoint with noop-with-token auth (user_id, token validation). |
authorized_rh_identity.feature |
/v1/authorized endpoint with RH identity auth (x-rh-identity header, entitlements). |
rbac.feature |
Role-Based Access Control: admin/user/viewer/query-only/no-role permissions on query, models, conversations, info. |
conversations.feature |
Conversations API: list, get by id, delete; auth and error cases. |
conversation_cache_v2.feature |
Conversation Cache V2 API: conversations CRUD, topic summary, cache-off and llama-stack-down behaviour. |
feedback.feature |
Feedback endpoint: enable/disable, status, submit feedback (sentiment, conversation id), invalid storage. |
health.feature |
Readiness and liveness endpoints; behaviour when llama-stack is unavailable. |
info.feature |
Info, OpenAPI, shields, tools, metrics, MCP client auth options endpoints. |
query.feature |
Query endpoint: LLM responses, system prompt, auth errors, missing/invalid params, attachments, context length (413), llama-stack down. |
streaming_query.feature |
Streaming query endpoint: token stream, system prompt, auth, params, attachments, context length (413 / stream error). |
rest_api.feature |
REST API: OpenAPI endpoint. |
mcp.feature |
MCP (Model Context Protocol): tools, query, streaming_query with MCP auth (required, token, invalid token). |
models.feature |
Models endpoint: list models, filter, empty result; error when llama-stack unreachable. |
If you add a new feature file, add it to tests/e2e/test_list.txt so it is included when you run the full E2E suite (e.g. make test-e2e). The order in that file is the run order.
- Feature files (
*.feature): GherkinFeature/Scenario/Given/When/Then. One file per area (query, streaming_query, health, models, info, feedback, conversations, rbac, etc.). - Steps: Implemented in
features/steps/*.py. Steps receivecontextand use it to store host/port, auth headers, response, and shared data (e.g.context.response,context.default_model). Placeholders like{MODEL}and{PROVIDER}in feature tables or docstrings are replaced withcontext.default_modelandcontext.default_providerviareplace_placeholders().
Key step modules:
- common.py: "The service is started locally" (set host/port from env), "The system is in default state".
- common_http.py: Status code, body content, headers.
- auth.py: Set Authorization header.
- llm_query_response.py: Call query/streaming_query, too-long query, parse streamed response, assert fragments and error messages.
- health.py: "The llama-stack connection is disrupted" (stop container in server mode; sets
llama_stack_was_runningfor restore in after_scenario).
Feature files use Gherkin syntax. Below is what each keyword means and how this project uses it.
| Keyword | Meaning | Example |
|---|---|---|
| Feature | Title and optional description of a capability. One per .feature file. |
Feature: Query endpoint API tests |
| Background | Steps run before every scenario in that feature. Use for common setup (e.g. "service started", "API prefix"). | Background: then Given The service is started locally |
| Scenario | One concrete test: a list of steps that set up, act, and assert. | Scenario: Check if LLM responds properly... |
| Scenario Outline | Template for multiple scenarios; steps can use placeholders that are filled from an Examples table. (Used when the same flow is repeated with different data.) | Scenario Outline: with Examples: table |
Each line in a scenario is a step. The keyword indicates the step's role; Behave matches the line to a step definition (e.g. @given("The service is started locally")).
| Keyword | Meaning | Typical use in this project |
|---|---|---|
| Given | Precondition or initial state. | Service is started, system in default state, auth header set, llama-stack disrupted. |
| When | The action under test. | Call an endpoint (query, streaming_query, GET readiness), send a request body. |
| Then | Expected outcome (assertion). | Status code is 200, body contains text or matches schema, response has certain fields. |
| And | Continuation of the previous keyword. Same role as the last Given/When/Then, but reads more naturally. | "Given X And Y" = two preconditions; "Then A And B" = two assertions. |
| But | Same as And, but used for contrast. | "Then status is 200 But body does not contain …" (rare in this suite.) |
Convention for this suite: Each scenario should have exactly one When step (the single action under test). It can have one or more Given steps (optionally followed by And for more preconditions) and one or more Then steps (optionally followed by And for more assertions). So: 1–n Given (+ And), one When, 1–n Then (+ And).
Example:
Scenario: Check if service report proper readiness state
Given The system is in default state
When I access endpoint "readiness" using HTTP GET method
Then The status code of the response is 200
And The body of the response is the following
"""
{"ready": true, "reason": "All providers are healthy", "providers": []}
"""Here, Given sets state, When performs the HTTP call, Then and And state the assertions.
| Syntax | Meaning | Example |
|---|---|---|
| Doc string | Multi-line argument to the step (between """). Often JSON request or expected body. |
Request body: """ … """ |
| Data table | Table of values (header row, then rows). The step receives it as context.table. |
| Fragments in LLM response | then | ask | |
| Placeholders | {MODEL} and {PROVIDER} in doc strings are replaced with context.default_model and context.default_provider by the steps. |
"model": "{MODEL}", "provider": "{PROVIDER}" |
| Step argument | Quoted or unquoted text in the step line. Matches capture groups in the step definition. | I use "query" to ask question → endpoint "query"; I access endpoint "readiness" → "readiness". |
| Syntax | Meaning | Example |
|---|---|---|
| @tag | Tag for filtering or hooks. Above Feature (applies to all scenarios) or above Scenario (that scenario only). | @Authorized, @skip, @skip-in-library-mode |
| # comment | Gherkin comment. Ignored by Behave. Use to explain why a scenario is skipped or to document a scenario. | # Only in server mode above a scenario |
- Feature = what is under test (e.g. "Query endpoint API tests").
- Background = shared setup for every scenario in the file.
- Scenario = one test case.
- Given = preconditions; When = action; Then / And = expectations.
- Doc strings (
""") = multi-line JSON or text; tables (|...|) = structured data for the step. - Placeholders
{MODEL}and{PROVIDER}are filled from context by the step code. - Tags (
@...) drive skipping and hooks; comments (#) are for humans.
- Choose or add a feature file under
tests/e2e/features/and use existing steps where possible. If you add a new file, add it totests/e2e/test_list.txtso the suite runs it. - Use tags for mode-dependent or config-dependent behavior (
@skip-in-library-mode,@Authorized, etc.). Adding a tag that switches configuration (e.g. a new feature-level or scenario-level config) usually means you must also add or change a Lightspeed Stack config file underconfiguration/server-mode/orlibrary-mode/and wire the tag inenvironment.py(e.g. inbefore_feature/after_featureorbefore_scenario/after_scenario) so the config is applied and the container restarted when the tag is active. - Use placeholders
{MODEL}and{PROVIDER}in request bodies so the same scenario works with different backends. - Add step definitions in the appropriate
features/steps/*.pyif you need new steps; reusecontextfor host, port, auth, and responses. - Optional: If the scenario needs a dedicated config, add a new YAML under
configuration/server-mode/(and optionallylibrary-mode/), add an entry to_CONFIG_PATHSinenvironment.py, and handle the tag in the before/after hooks so the config is switched and the lightspeed-stack container is restarted. - Run with
uv run make test-e2eor a targetedbehavecommand; exclude@skipwith--tags=-skipif needed.
- 503 or "Unable to connect to Llama Stack": In server mode, ensure the Llama Stack container is running and healthy. After a scenario that disrupts Llama Stack,
after_scenariorestores it; if restore fails, check diagnostics (see_print_llama_stack_diagnosticsinenvironment.pyif present) and container logs. - "Container state improper" / restart fails: Usually the llama-stack container is in a bad state. Ensure it is started (or recreated) before restarting lightspeed-stack; see Docker/Podman and compose usage in the project.
- Readonly database (SQLite) in Llama Stack: If the RAG KV DB is on a bind-mounted path that becomes read-only (e.g. after restart), move it to a named volume (e.g. via
KV_RAG_PATHin docker-compose) so writes succeed. - ChunkedEncodingError on streaming_query: The step for streaming_query uses
stream=Trueand consumes the stream; if you add new streaming steps, avoid reading the full response withresponse.contentand use the same stream-reading pattern so a server close after an error event does not raise. - Event loop is closed (httpx/AsyncClient): In E2E, any code that creates an
AsyncLlamaStackClient(e.g. for shields) must close it (e.g.await client.close()) in afinallyblock before the event loop is torn down (e.g. beforeasyncio.run()returns). - Scenarios skipped: Check tags (
@skip,@skip-in-library-mode,@local) andE2E_DEPLOYMENT_MODE; ensure the scenario is not excluded by--tags=-skip(or the opposite if you intend to run only skipped scenarios for debugging).
For more on test structure and commands, see the main project guide (CLAUDE.md) and tests/e2e/features/steps/README.md.