Skip to content

A FastAPI-powered Document Q&A system using FAISS for retrieval and Sentence Transformers for embeddings.

License

Notifications You must be signed in to change notification settings

stutipandey20/Document-QA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“„ Document-QA

A FastAPI-powered Document Q&A system that utilizes FAISS for efficient retrieval and SentenceTransformers for embedding-based search. It enables users to upload PDF documents and ask questions, retrieving contextually relevant answers from the documents.


πŸš€ Features

βœ… Upload PDFs and extract meaningful content
βœ… FAISS-based Retrieval for efficient search
βœ… Sentence Embeddings for improved answer relevance
βœ… FastAPI Backend for handling queries
βœ… Streamlit UI for easy interaction


πŸ› οΈ Tech Stack

  • Backend: FastAPI
  • Frontend: Streamlit
  • Embeddings: SentenceTransformers (all-MiniLM-L6-v2)
  • Vector Search: FAISS
  • PDF Processing: PyMuPDF (fitz)
  • Re-ranking: CrossEncoder (MS MARCO)

πŸ“‚ Project Structure

document-qa/ 
│── uploads/ # Folder to store uploaded PDFs
│── .venv/ # Virtual environment (not tracked)
│── main.py # FastAPI Backend
│── app.py # Streamlit Frontend
│── requirements.txt # Dependencies
│── .gitignore # Ignored files (venv, cache, large files)
│── README.md # Project Documentation

πŸ› οΈ Setup Instructions

1️⃣ Clone the Repository

git clone https://github.com/stutipandey20/Document-QA.git
cd Document-QA

2️⃣ Create a Virtual Environment

python3 -m venv .venv
source .venv/bin/activate   # macOS/Linux  
.venv\Scripts\activate      # Windows

3️⃣ Install Dependencies

pip install -r requirements.txt

4️⃣ Run the Backend (FastAPI)

uvicorn main:app --reload

FastAPI will be available at: http://127.0.0.1:8000/docs

5️⃣ Run the Frontend (Streamlit)

streamlit run app.py

πŸ“Œ How It Works

1️⃣ Upload a PDF document using the frontend.
2️⃣ The document is processed and stored in FAISS.
3️⃣ Ask a question, and the system retrieves the most relevant passage.
4️⃣ Re-ranks results for improved answer accuracy.

🎯 Learnings & Takeaways

How to build a document-based Q&A system. Efficient use of FAISS for fast vector retrieval. Combining NLP models for better information retrieval. Deploying a FastAPI-based backend with a user-friendly UI.

πŸ“Œ TODO (Next Improvements)

Add support for multi-document queries Improve answer extraction using a fine-tuned model Deploy on AWS/GCP for public access

πŸ† Acknowledgments

Special thanks to FastAPI, FAISS, and Hugging Face Transformers for providing amazing open-source tools that made this project possible.

About

A FastAPI-powered Document Q&A system using FAISS for retrieval and Sentence Transformers for embeddings.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages