feat: add OpenAI-compatible Bedrock provider #3748

skamenan7 · 2025-10-08T22:10:21Z

Implements AWS Bedrock inference provider using OpenAI-compatible endpoint for Llama models available through Bedrock.

Closes: #3410

What does this PR do?

Adds AWS Bedrock as an inference provider using the OpenAI-compatible endpoint. This lets us use Bedrock models (GPT-OSS, Llama) through the standard llama-stack inference API.

The implementation uses LiteLLM's OpenAI client under the hood, so it gets all the OpenAI compatibility features. The provider handles per-request API key overrides via headers.

Test Plan

Tested the following scenarios:

Non-streaming completion - basic request/response flow
Streaming completion - SSE streaming with chunked responses
Multi-turn conversations - context retention across turns
Tool calling - function calling with proper tool_calls format

Bedrock OpenAI-Compatible Provider - Test Results

Model: bedrock-inference/openai.gpt-oss-20b-1:0

Test 1: Model Listing

Request:

GET /v1/models HTTP/1.1

Response:

HTTP/1.1 200 OK
Content-Type: application/json

{
  "data": [
    {"identifier": "bedrock-inference/openai.gpt-oss-20b-1:0", ...},
    {"identifier": "bedrock-inference/openai.gpt-oss-40b-1:0", ...}
  ]
}

Test 2: Non-Streaming Completion

Request:

POST /v1/chat/completions HTTP/1.1
Content-Type: application/json

{
  "model": "bedrock-inference/openai.gpt-oss-20b-1:0",
  "messages": [{"role": "user", "content": "Say 'Hello from Bedrock' and nothing else"}],
  "stream": false
}

Response:

HTTP/1.1 200 OK
Content-Type: application/json

{
  "choices": [{
    "finish_reason": "stop",
    "message": {"content": "...Hello from Bedrock"}
  }],
  "usage": {"prompt_tokens": 79, "completion_tokens": 50, "total_tokens": 129}
}

Test 3: Streaming Completion

Request:

POST /v1/chat/completions HTTP/1.1
Content-Type: application/json

{
  "model": "bedrock-inference/openai.gpt-oss-20b-1:0",
  "messages": [{"role": "user", "content": "Count from 1 to 5"}],
  "stream": true
}

Response:

HTTP/1.1 200 OK
Content-Type: text/event-stream

[6 SSE chunks received]
Final content: "1, 2, 3, 4, 5"

Test 4: Error Handling - Invalid Model

Request:

POST /v1/chat/completions HTTP/1.1
Content-Type: application/json

{
  "model": "invalid-model-id",
  "messages": [{"role": "user", "content": "Hello"}],
  "stream": false
}

Response:

HTTP/1.1 404 Not Found
Content-Type: application/json

{
  "detail": "Model 'invalid-model-id' not found. Use 'client.models.list()' to list available Models."
}

Test 5: Multi-Turn Conversation

Request 1:

POST /v1/chat/completions HTTP/1.1

{
  "messages": [{"role": "user", "content": "My name is Alice"}]
}

Response 1:

HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "...Nice to meet you, Alice! How can I help you today?"}
  }]
}

Request 2 (with history):

POST /v1/chat/completions HTTP/1.1

{
  "messages": [
    {"role": "user", "content": "My name is Alice"},
    {"role": "assistant", "content": "...Nice to meet you, Alice!..."},
    {"role": "user", "content": "What is my name?"}
  ]
}

Response 2:

HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "...Your name is Alice."}
  }],
  "usage": {"prompt_tokens": 183, "completion_tokens": 42}
}

Context retained across turns

Test 6: System Messages

Request:

POST /v1/chat/completions HTTP/1.1

{
  "messages": [
    {"role": "system", "content": "You are Shakespeare. Respond only in Shakespearean English."},
    {"role": "user", "content": "Tell me about the weather"}
  ]
}

Response:

HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "Lo! I heed thy request..."}
  }],
  "usage": {"completion_tokens": 813}
}

Test 7: Tool Calling

Request:

POST /v1/chat/completions HTTP/1.1

{
  "messages": [{"role": "user", "content": "What's the weather in San Francisco?"}],
  "tools": [{
    "type": "function",
    "function": {
      "name": "get_weather",
      "parameters": {"type": "object", "properties": {"location": {"type": "string"}}}
    }
  }]
}

Response:

HTTP/1.1 200 OK

{
  "choices": [{
    "finish_reason": "tool_calls",
    "message": {
      "tool_calls": [{
        "function": {"name": "get_weather", "arguments": "{\"location\":\"San Francisco\"}"}
      }]
    }
  }]
}

Test 8: Sampling Parameters

Request:

POST /v1/chat/completions HTTP/1.1

{
  "messages": [{"role": "user", "content": "Say hello"}],
  "temperature": 0.7,
  "top_p": 0.9
}

Response:

HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "...Hello! 👋 How can I help you today?"}
  }]
}

Test 9: Authentication Error Handling

Subtest A: Invalid API Key

Request:

POST /v1/chat/completions HTTP/1.1
x-llamastack-provider-data: {"aws_bedrock_api_key": "invalid-fake-key-12345"}

{"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...}

Response:

HTTP/1.1 400 Bad Request

{
  "detail": "Invalid value: Authentication failed: Error code: 401 - {'error': {'message': 'Invalid API Key format: Must start with pre-defined prefix', ...}}"
}

Subtest B: Empty API Key (Fallback to Config)

Request:

POST /v1/chat/completions HTTP/1.1
x-llamastack-provider-data: {"aws_bedrock_api_key": ""}

{"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...}

Response:

HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "...Hello! How can I assist you today?"}
  }]
}

Fell back to config key

Subtest C: Malformed Token

Request:

POST /v1/chat/completions HTTP/1.1
x-llamastack-provider-data: {"aws_bedrock_api_key": "not-a-valid-bedrock-token-format"}

{"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...}

Response:

HTTP/1.1 400 Bad Request

{
  "detail": "Invalid value: Authentication failed: Error code: 401 - {'error': {'message': 'Invalid API Key format: Must start with pre-defined prefix', ...}}"
}

leseb

Use this one as a reference #3707

llama_stack/providers/remote/inference/bedrock/bedrock.py

leseb

Please report the result of the tests/integration/inference/test_openai_completion.py and other openai related tested.

Also why has the uv.lock changed?

mattf

nothing from models.py is used, please remove it
is the /v1/embeddings endpoint available? if not, add a NotImplementedError stub
is the /v1/compltions endpoint available? if not...
great find wrt telemetry and stream usage, after this pr we should consider adding that nugget to the mixin for all providers

uv.lock

llama_stack/providers/remote/inference/bedrock/bedrock.py

skamenan7 · 2025-10-27T18:28:22Z

@leseb I addressed missing comments. I had actually thought I addressed these before. Sorry, with deeper look I found logger category comment to be addressed. Thanks.

leseb · 2025-10-28T08:18:28Z

@leseb I addressed missing comments. I had actually thought I addressed these before. Sorry, with deeper look I found logger category comment to be addressed. Thanks.

no worries, please rebase.

mattf

nothing from models.py is used, please remove it
is the /v1/embeddings endpoint available? if not, add a NotImplementedError stub
is the /v1/compltions endpoint available? if not...
great find wrt telemetry and stream usage, after this pr we should consider adding that nugget to the mixin for all providers

new -

BedrockConfig is a RemoteInferenceProviderConfig, use auth_credential instead of a new api_key field, see https://github.com/llamastack/llama-stack/blob/main/src/llama_stack/providers/utils/inference/model_registry.py#L22
you don't need to override get_api_key
instead of overriding register_model, use async def check_model_availability(self, model: str) -> bool: return True

src/llama_stack/providers/remote/inference/bedrock/bedrock.py

src/llama_stack/providers/remote/inference/bedrock/config.py

src/llama_stack/providers/remote/inference/bedrock/bedrock.py

Implements AWS Bedrock inference provider using OpenAI-compatible endpoint for Llama models available through Bedrock. Changes: - Add BedrockInferenceAdapter using OpenAIMixin base - Configure region-specific endpoint URLs - Add NotImplementedError stubs for unsupported endpoints - Implement authentication error handling with helpful messages - Remove unused models.py file - Add comprehensive unit tests (12 total) - Add provider registry configuration

… provider

Refactor to use auth_credential for consistent credential management and improve error handling with defensive checks. Changes: - Use auth_credential instead of api_key for better credential handling - Simplify model availability check to accept all pre-registered models - Guard metrics collection when usage data is missing in responses - Add debug logging for better troubleshooting of API issues - Update unit tests for auth_credential refactoring

skamenan7 · 2025-10-30T15:07:22Z

Created issue #3981 as per @mattf suggestion.

skamenan7 · 2025-10-30T15:08:12Z

Feel I addressed all comments, @leseb and @mattf . Please have a look. Thanks!

leseb · 2025-10-30T15:11:00Z

@skamenan7 CI is failing on a few tests

…#3748

…amastack#3748" This reverts commit 4dd8f25.

…lamastack#3748" This reverts commit 6c793af.

- Fix test assertion to match actual error message - Add missing has_model method to FakeModelStore mock - Remove redundant comments and update docs

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 8, 2025

skamenan7 force-pushed the feat-3410-bedrock-openai-compatible-provider branch 2 times, most recently from 59d4cfa to e4d71e7 Compare October 9, 2025 12:34

leseb requested changes Oct 9, 2025

View reviewed changes

llama_stack/providers/remote/inference/bedrock/bedrock.py Outdated Show resolved Hide resolved

llama_stack/providers/remote/inference/bedrock/bedrock.py Outdated Show resolved Hide resolved

skamenan7 force-pushed the feat-3410-bedrock-openai-compatible-provider branch 5 times, most recently from 14919c1 to 3a9af0c Compare October 9, 2025 17:44

skamenan7 marked this pull request as ready for review October 9, 2025 21:09

skamenan7 requested review from ashwinb, bbrowning, ehhuang, franciscojavierarceo, hardikjshah, mattf, raghotham, reluctantfuturist, slekkala1, terrytangyuan and yanxi0830 as code owners October 9, 2025 21:09

skamenan7 requested a review from leseb October 9, 2025 21:09

leseb requested changes Oct 10, 2025

View reviewed changes

mattf requested changes Oct 10, 2025

View reviewed changes

uv.lock Show resolved Hide resolved

llama_stack/providers/remote/inference/bedrock/bedrock.py Outdated Show resolved Hide resolved

This was referenced Oct 10, 2025

Cleanup uses of OpenAIMixin, simplify inference adapters #3517

Closed

Standardize Inference Providers to Use OpenAIMixin #3387

Closed

skamenan7 force-pushed the feat-3410-bedrock-openai-compatible-provider branch from 3a9af0c to 56bff11 Compare October 10, 2025 14:04

skamenan7 requested review from leseb and mattf October 10, 2025 14:05

skamenan7 force-pushed the feat-3410-bedrock-openai-compatible-provider branch from 56bff11 to 7024e56 Compare October 13, 2025 20:19

skamenan7 force-pushed the feat-3410-bedrock-openai-compatible-provider branch from 7024e56 to 55aaa6e Compare October 13, 2025 21:04

skamenan7 force-pushed the feat-3410-bedrock-openai-compatible-provider branch from 55aaa6e to dd2a1f6 Compare October 27, 2025 18:26

skamenan7 force-pushed the feat-3410-bedrock-openai-compatible-provider branch from dd2a1f6 to 296d7a9 Compare October 28, 2025 13:15

mattf requested changes Oct 28, 2025

View reviewed changes

src/llama_stack/providers/remote/inference/bedrock/bedrock.py Outdated Show resolved Hide resolved

src/llama_stack/providers/remote/inference/bedrock/config.py Outdated Show resolved Hide resolved

src/llama_stack/providers/remote/inference/bedrock/bedrock.py Outdated Show resolved Hide resolved

mattf approved these changes Oct 29, 2025

View reviewed changes

skamenan7 force-pushed the feat-3410-bedrock-openai-compatible-provider branch from e4a5321 to 96c2793 Compare October 29, 2025 19:56

skamenan7 added 3 commits October 29, 2025 16:51

fix: add logger with category and update telemetry import for bedrock…

f7aa03c

… provider

skamenan7 force-pushed the feat-3410-bedrock-openai-compatible-provider branch from 96c2793 to a7a6191 Compare October 29, 2025 20:54

skamenan7 mentioned this pull request Oct 30, 2025

Enable streaming usage metrics in OpenAIMixin for all OpenAI-compatible providers #3981

Open

skamenan7 requested a review from mattf October 30, 2025 15:08

skamenan7 added a commit to skamenan7/llama-stack that referenced this pull request Oct 31, 2025

bedrock: align auth error message with tests; fix CI on PR llamastack…

4dd8f25

…#3748

skamenan7 added a commit to skamenan7/llama-stack that referenced this pull request Oct 31, 2025

Revert "bedrock: align auth error message with tests; fix CI on PR ll…

6c793af

…amastack#3748" This reverts commit 4dd8f25.

skamenan7 added a commit to skamenan7/llama-stack that referenced this pull request Oct 31, 2025

Reapply "bedrock: align auth error message with tests; fix CI on PR l…

7f1acbd

…lamastack#3748" This reverts commit 6c793af.

skamenan7 force-pushed the feat-3410-bedrock-openai-compatible-provider branch from 7f1acbd to a7a6191 Compare October 31, 2025 13:02

fix: resolve CI test failures for Bedrock provider

e0f0edc

- Fix test assertion to match actual error message - Add missing has_model method to FakeModelStore mock - Remove redundant comments and update docs

skamenan7 force-pushed the feat-3410-bedrock-openai-compatible-provider branch from ba0c4e4 to e0f0edc Compare October 31, 2025 14:29

feat: add OpenAI-compatible Bedrock provider #3748

Are you sure you want to change the base?

feat: add OpenAI-compatible Bedrock provider #3748

Uh oh!

Conversation

skamenan7 commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Test Plan

Bedrock OpenAI-Compatible Provider - Test Results

Test 1: Model Listing

Test 2: Non-Streaming Completion

Test 3: Streaming Completion

Test 4: Error Handling - Invalid Model

Test 5: Multi-Turn Conversation

Test 6: System Messages

Test 7: Tool Calling

Test 8: Sampling Parameters

Test 9: Authentication Error Handling

Subtest A: Invalid API Key

Subtest B: Empty API Key (Fallback to Config)

Subtest C: Malformed Token

Uh oh!

leseb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

leseb left a comment

Choose a reason for hiding this comment

Uh oh!

mattf left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

skamenan7 commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leseb commented Oct 28, 2025

Uh oh!

mattf left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

skamenan7 commented Oct 30, 2025

Uh oh!

skamenan7 commented Oct 30, 2025

Uh oh!

leseb commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

skamenan7 commented Oct 8, 2025 •

edited

Loading

skamenan7 commented Oct 27, 2025 •

edited

Loading

mattf left a comment •

edited

Loading