- Create a chat engine with LlamaIndex to answer question based on a set of pre-selected documents.
- Leverage Streamlit for file uploads and interactive communication with the engine.
-
Clone the repo
-
You can run the docker-compose command to launch the app with docker containers, and then type a question in the chat interface.
docker-compose up --build
- Start Xinference cluster
xinference --log-level debug
- Launch an Embedding model and a LLM model, get their model_uids. For example,
launching
bge-large-zh
(embedding) andchatglm3
(LLM):
from xinference.client import Client
client = Client("http://127.0.0.1:9997")
model_uid = client.launch_model(model_name="bge-large-zh", model_type="embedding")
model_uid2 = client.launch_model(model_name="chatglm3", quantization=None, model_format='pytorch', model_size_in_billions=6)
print(model_uid, model_uid2)
- Modify
docker-compose.yml
using the above model_uids, for example:
version: "2"
services:
app:
build: .
network_mode: "host"
ports:
- "8501:8501"
volumes:
- ./app:/app/app
environment:
- LLM=xinference
- EMBEDDING=xinference
- XINFERENCE_SERVER_ENDPOINT=http://127.0.0.1:9997
- XINFERENCE_EMBEDDING_MODEL_UID=<model_uid>
- XINFERENCE_LLM_MODEL_UID=<model_uid2>
- HISTORY_KEEP_CNT=10
- Deploy this application:
docker-compose up --build
In you want to run a local dev environment, the following command will let you test the application with OpenAI API.
poetry install
LLM=openai EMBEDDING=openai streamlit run app/main.py
- If you want to use OpenAI, check that you've created an .env file that contains your valid (and working) API keys.