Skip to content

Conversation

@eero-t
Copy link
Collaborator

@eero-t eero-t commented Jul 18, 2025

Description

vLLM already support all ChatQnA inferencing sub-services: LLM, embedding, reranking, guardrails. In ChatQnA example value files, all except embedding are HW accelerated. Additionally, KubeAI already supports first three, and OPEA has Enterprise-Inferencing subproject: https://github.com/opea-project/Enterprise-Inference

Therefore it seems relevant to start discussion on how current external LLM support in the Helm charts could be changed to a more generic external inferencing support, before current support gets into too wide use.

To start that discussing, this PR includes draft of such changes for ChatQnA, and TODOs for items currently missing from ChatQnA code in GenAIExamples.

Issues

n/a.

Type of change

  • New feature (non-breaking change which adds new functionality)

Dependencies

This is dependency disconnection.

Tests

Not relevant yet.

@eero-t eero-t marked this pull request as draft July 18, 2025 20:14
@eero-t eero-t changed the title Generalize ChatQnA external LLM to external inferencing support (WIP) Generalize ChatQnA external LLM to external inferencing support Jul 18, 2025
@eero-t eero-t force-pushed the external-inferencing branch from 30ae64a to 4742ac4 Compare August 11, 2025 09:42
@eero-t eero-t force-pushed the external-inferencing branch from 2a50e5d to 09c57c6 Compare August 11, 2025 09:48
@CICD-at-OPEA
Copy link
Collaborator

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@CICD-at-OPEA
Copy link
Collaborator

This PR was closed because it has been stalled for 7 days with no activity.

@eero-t
Copy link
Collaborator Author

eero-t commented Oct 1, 2025

KubeAI just got support for LLM /rerank API: kubeai-project/kubeai#565

It's not in any release yet though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants