Skip to content

Conversation

@dutchfarao
Copy link

@dutchfarao dutchfarao commented Nov 3, 2025

This PR adds the ability to register pydantic-ai agents and expose them as an OpenAI compatible endpoint. The registry allows you to register agents that can be either exposed as /v1/chat/completions and/or /v1/responses endpoint. The /v1/models endpoint derives all the unique models by their names that exist in the AgentRegistry.

Streaming support for responses API has not been implemented yet, can be done in a follow up PR.

@dutchfarao dutchfarao force-pushed the feat/fastapi_openai_spec branch 12 times, most recently from b81d7ab to 4705ba1 Compare November 3, 2025 21:33
@DouweM
Copy link
Collaborator

DouweM commented Nov 3, 2025

@dutchfarao Nice work! @Kludex Please have a look

@dutchfarao dutchfarao force-pushed the feat/fastapi_openai_spec branch 8 times, most recently from 9c28ece to 50430ae Compare November 3, 2025 22:27
@dutchfarao dutchfarao force-pushed the feat/fastapi_openai_spec branch from 50430ae to c054723 Compare November 3, 2025 22:33
@ion-elgreco
Copy link

Me and @dutchfarao are sitting at a bar here, trying to fix the CI. We are frustrated and are giving up haha, the CI should use markers instead of this try_import stuff😆

Would like your support here

@dutchfarao
Copy link
Author

Me and @dutchfarao are sitting at a bar here, trying to fix the CI. We are frustrated and are giving up haha, the CI should use markers instead of this try_import stuff😆

Would like your support here

The 20 force pushes were an indication

Signed-off-by: Ion Koutsouris <[email protected]>
@ion-elgreco ion-elgreco force-pushed the feat/fastapi_openai_spec branch from ddadb21 to 964fd4d Compare November 4, 2025 10:17
@ion-elgreco
Copy link

Resolved the CI, having a clear mind helps ;)

Copy link
Member

@Kludex Kludex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll stop my review.

This is not a good first step. Please never open a 6k lines PR on a repository without discussing it first.

I think the first step would be to implement what my PR (#2041) proposed first, and then expand the idea to Responses API, if needed.

except ImportError as _import_error: # pragma: no cover
raise ImportError(
'Please install the `openai` package to enable the fastapi openai compatible endpoint, '
'you can use the `openai` and `fastapi` optional group — `pip install "pydantic-ai-slim[openai,fastapi]"`'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the extra should be called fastapi. The idea is to expose the chat completions/responses endpoints - so maybe chat-completions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module shouldn't be called fastapi. The underlying package doesn't matter.

Copy link

@ion-elgreco ion-elgreco Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are open to better name suggestions! Internally we called this differently

from pydantic_ai.fastapi.registry import AgentRegistry
from pydantic_ai.settings import ModelSettings

logger = logging.getLogger(__name__)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need for this.

Comment on lines 77 to 79
except Exception as e:
logger.error(f'Error creating completion: {e}')
raise
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error is shown anyway, there's no need for this handling.

Also, in other code sources, you may want to do: logger.exception('Error when creating completion'), since the exception is already logged.


Suggested change
except Exception as e:
logger.error(f'Error creating completion: {e}')
raise

Comment on lines 37 to 38
*args: tuple[Any],
**kwargs: tuple[Any],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong. The type of the *args and **kwargs should represent each element in the case of args, and each value in the case of kwargs - that said, this is a properly typed library, so we can't include this as is.

Comment on lines 121 to 131
except Exception as e:
logger.error(f'Error in responses: {e}', exc_info=True)
raise HTTPException(
status_code=500,
detail=ErrorResponse(
error=ErrorObject(
type='internal_server_error',
message=str(e),
),
).model_dump(),
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need again.

Comment on lines 133 to 165
@self.get('/v1/models', response_model=ModelsResponse)
async def get_models() -> ModelsResponse: # type: ignore
try:
return await self.models_api.list_models()
except Exception as e:
logger.error(f'Error listing models: {e}', exc_info=True)
raise HTTPException(
status_code=500,
detail=ErrorResponse(
error=ErrorObject(
type='internal_server_error',
message=f'Error retrieving models: {str(e)}',
),
).model_dump(),
)

@self.get('/v1/models' + '/{model_id}', response_model=Model)
async def get_model(model_id: str) -> Model: # type: ignore
try:
return await self.models_api.get_model(model_id)
except HTTPException:
raise
except Exception as e:
logger.error(f'Error fetching model info: {e}', exc_info=True)
raise HTTPException(
status_code=500,
detail=ErrorResponse(
error=ErrorObject(
type='internal_server_error',
message=f'Error retrieving model: {str(e)}',
),
).model_dump(),
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary?

logger = logging.getLogger(__name__)


class AgentRegistry:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should work on a registry basis.

I think the Agent object should expose an ASGI application, that you can expose.

If you need multiple agents to become apps/endpoints, you can mount them together.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mounting means you create a subpath per model, doing it on registry bases allows you to have a single endpoint with N amount of agents (models) and OpenAI spec to select the models.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point.

I don't think we want to work with registry design anyway. A factory that creates an ASGI application would be ideal.

def expose_agents(
    agents: dict[str, Agent],
    *,
    chat_completions_url: '/v1/chat/completions' | None = '/v1/chat/completions',
    responses_url: '/v1/responses' | None = None,
) -> ASGIApp:

I'm not sure the above is the best name for the function or the parameters are the best design, but the idea is to simplify the logic here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if it simplifies. The registry has the property to return the full list of unique models, and we also have distinguishment with chat completions or responses api only models, additionally we make it simple to derive the name from the agent

agent_registry = (
    AgentRegistry()
    .register_completions_agent(completion_agent) # auto derives the name from Agent instance or you override the name
    .register_responses_agent(responses_agent)
)

app = FastAPI(
    title="LLM Agent API",
    description="OpenAI-compatible API with pydantic-ai backend",
    version="1.0.0",
    lifespan=lifespan,
)
app.include_router(AgentAPIRouter(agent_registry=agent_registry))

openai_client_for_chat = AsyncOpenAI(
base_url=fake_openai_base,
api_key='test-key',
http_client=DefaultAioHttpClient(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why using DefaultAioHttpClient?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be removed indeed for the test, we simply were using aiohttp as our default client for the normal

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it is needed for the mocking

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not need to use aioresponses. We can use VCR.

"pip>=25.2",
"genai-prices>=0.0.28",
"mcp-run-python>=0.0.20",
"pytest-asyncio>=1.2.0",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? The whole code sources uses anyio.

@ion-elgreco ion-elgreco force-pushed the feat/fastapi_openai_spec branch 3 times, most recently from ad71cc6 to 6d7a61e Compare November 4, 2025 12:27
Signed-off-by: Ion Koutsouris <[email protected]>
@ion-elgreco ion-elgreco force-pushed the feat/fastapi_openai_spec branch from 6d7a61e to 67d47cd Compare November 4, 2025 12:30
@Kludex
Copy link
Member

Kludex commented Nov 4, 2025

I think the first step would be to implement what my PR (#2041) proposed first, and then expand the idea to Responses API, if needed.

@ion-elgreco I think the idea of this PR is a good, but I think we really need to strip it down first, as I mentioned on the comment above.

@ion-elgreco
Copy link

I think the first step would be to implement what my PR (#2041) proposed first, and then expand the idea to Responses API, if needed.

@ion-elgreco I think the idea of this PR is a good, but I think we really need to strip it down first, as I mentioned on the comment above.

I'm fine with splitting the responses and chat completions api, and just stack them as seperate PRs. But it won't make a huge difference, since it's a contained code change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for registering Pydantic-AI agents as OpenAI-compatible endpoints

4 participants