📄 Document-QA

A FastAPI-powered Document Q&A system that utilizes FAISS for efficient retrieval and SentenceTransformers for embedding-based search. It enables users to upload PDF documents and ask questions, retrieving contextually relevant answers from the documents.

🚀 Features

✅ Upload PDFs and extract meaningful content
✅ FAISS-based Retrieval for efficient search
✅ Sentence Embeddings for improved answer relevance
✅ FastAPI Backend for handling queries
✅ Streamlit UI for easy interaction

🛠️ Tech Stack

Backend: FastAPI
Frontend: Streamlit
Embeddings: SentenceTransformers (all-MiniLM-L6-v2)
Vector Search: FAISS
PDF Processing: PyMuPDF (fitz)
Re-ranking: CrossEncoder (MS MARCO)

📂 Project Structure

document-qa/ 
│── uploads/ # Folder to store uploaded PDFs
│── .venv/ # Virtual environment (not tracked)
│── main.py # FastAPI Backend
│── app.py # Streamlit Frontend
│── requirements.txt # Dependencies
│── .gitignore # Ignored files (venv, cache, large files)
│── README.md # Project Documentation

🛠️ Setup Instructions

1️⃣ Clone the Repository

git clone https://github.com/stutipandey20/Document-QA.git
cd Document-QA

2️⃣ Create a Virtual Environment

python3 -m venv .venv
source .venv/bin/activate   # macOS/Linux  
.venv\Scripts\activate      # Windows

3️⃣ Install Dependencies

pip install -r requirements.txt

4️⃣ Run the Backend (FastAPI)

uvicorn main:app --reload

FastAPI will be available at: http://127.0.0.1:8000/docs

5️⃣ Run the Frontend (Streamlit)

streamlit run app.py

📌 How It Works

1️⃣ Upload a PDF document using the frontend.
2️⃣ The document is processed and stored in FAISS.
3️⃣ Ask a question, and the system retrieves the most relevant passage.
4️⃣ Re-ranks results for improved answer accuracy.

🎯 Learnings & Takeaways

How to build a document-based Q&A system. Efficient use of FAISS for fast vector retrieval. Combining NLP models for better information retrieval. Deploying a FastAPI-based backend with a user-friendly UI.

📌 TODO (Next Improvements)

Add support for multi-document queries Improve answer extraction using a fine-tuned model Deploy on AWS/GCP for public access

🏆 Acknowledgments

Special thanks to FastAPI, FAISS, and Hugging Face Transformers for providing amazing open-source tools that made this project possible.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📄 Document-QA

🚀 Features

🛠️ Tech Stack

📂 Project Structure

🛠️ Setup Instructions

1️⃣ Clone the Repository

2️⃣ Create a Virtual Environment

3️⃣ Install Dependencies

4️⃣ Run the Backend (FastAPI)

5️⃣ Run the Frontend (Streamlit)

📌 How It Works

🎯 Learnings & Takeaways

📌 TODO (Next Improvements)

🏆 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.vscode		.vscode
uploads		uploads
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
main.py		main.py
requirements.txt		requirements.txt

License

stutipandey20/Document-QA

Folders and files

Latest commit

History

Repository files navigation

📄 Document-QA

🚀 Features

🛠️ Tech Stack

📂 Project Structure

🛠️ Setup Instructions

1️⃣ Clone the Repository

2️⃣ Create a Virtual Environment

3️⃣ Install Dependencies

4️⃣ Run the Backend (FastAPI)

5️⃣ Run the Frontend (Streamlit)

📌 How It Works

🎯 Learnings & Takeaways

📌 TODO (Next Improvements)

🏆 Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages