A FastAPI-powered Document Q&A system that utilizes FAISS for efficient retrieval and SentenceTransformers for embedding-based search. It enables users to upload PDF documents and ask questions, retrieving contextually relevant answers from the documents.
β
Upload PDFs and extract meaningful content
β
FAISS-based Retrieval for efficient search
β
Sentence Embeddings for improved answer relevance
β
FastAPI Backend for handling queries
β
Streamlit UI for easy interaction
- Backend: FastAPI
- Frontend: Streamlit
- Embeddings: SentenceTransformers (all-MiniLM-L6-v2)
- Vector Search: FAISS
- PDF Processing: PyMuPDF (fitz)
- Re-ranking: CrossEncoder (MS MARCO)
document-qa/
βββ uploads/ # Folder to store uploaded PDFs
βββ .venv/ # Virtual environment (not tracked)
βββ main.py # FastAPI Backend
βββ app.py # Streamlit Frontend
βββ requirements.txt # Dependencies
βββ .gitignore # Ignored files (venv, cache, large files)
βββ README.md # Project Documentation
git clone https://github.com/stutipandey20/Document-QA.git
cd Document-QA
python3 -m venv .venv
source .venv/bin/activate # macOS/Linux
.venv\Scripts\activate # Windows
pip install -r requirements.txt
uvicorn main:app --reload
FastAPI will be available at: http://127.0.0.1:8000/docs
streamlit run app.py
1οΈβ£ Upload a PDF document using the frontend.
2οΈβ£ The document is processed and stored in FAISS.
3οΈβ£ Ask a question, and the system retrieves the most relevant passage.
4οΈβ£ Re-ranks results for improved answer accuracy.
How to build a document-based Q&A system. Efficient use of FAISS for fast vector retrieval. Combining NLP models for better information retrieval. Deploying a FastAPI-based backend with a user-friendly UI.
Add support for multi-document queries Improve answer extraction using a fine-tuned model Deploy on AWS/GCP for public access
Special thanks to FastAPI, FAISS, and Hugging Face Transformers for providing amazing open-source tools that made this project possible.