🔴 Live Demo: pdf-rag-pipeline.streamlit.app
Upload PDFs, chunk them, index with FAISS, and query with Llama 3.3 via Groq — all through a Streamlit chat interface.
Drag-and-drop any PDF (annual reports, research papers, contracts, whatever) and ask questions against it. The pipeline:
- Extracts text from uploaded PDFs via PyPDF2
- Chunks the text with LangChain's
RecursiveCharacterTextSplitter(10k chars, 1k overlap) - Embeds chunks using
sentence-transformers/all-MiniLM-L6-v2(runs locally, no API needed) - Indexes into a FAISS vector store for similarity search
- Queries the top-k similar chunks against Llama 3.3 70B on Groq for fast inference
The prompt template is tuned for financial document analysis (annual reports, related-party transactions, KMP remuneration) but works on any document type.
# clone + setup
git clone https://github.com/parity-byte/pdf-rag-pipeline.git
cd pdf-rag-pipeline
# install deps (pick one)
uv sync # if you use uv
pip install -r requirements.txt # otherwise
# set your Groq API key
cp .env.example .env
# edit .env and add your key (free at console.groq.com)
# run
streamlit run app.pygraph TD
subgraph Data Ingestion
A[PDF Document] -->|PyPDF2| B(Text Extraction)
B --> C{RecursiveCharacter\nTextSplitter}
C -->|10k chunks, 1k overlap| D[HuggingFace:\nall-MiniLM-L6-v2]
D -->|Embeddings| E[(FAISS Vector Index)]
end
subgraph Query Execution
F([User Query]) --> G[HuggingFace\nEmbeddings]
G -->|similarity_search| E
E -->|Top-k Docs| H{LangChain QA Chain}
F --> H
H -->|Prompt + Context| L[Groq: Llama 3.3 70B]
L --> I([Final Answer])
end
The embeddings run entirely locally (no API call). Only the final LLM inference hits Groq's API, which is free-tier friendly (~30 req/min).
- FAISS over Pinecone/Qdrant — everything runs locally, no cloud vector DB signup needed. Good for demos and small-to-medium document sets.
- Groq over OpenAI — Llama 3.3 70B on Groq is free, fast (>500 tok/s), and avoids vendor lock-in.
- HuggingFace embeddings over Google/OpenAI —
all-MiniLM-L6-v2runs on CPU, zero API cost, and the quality is solid for retrieval. - Streamlit — fastest way to get a chat UI running. The custom HTML/CSS gives it a proper chat-bubble look instead of default Streamlit widgets.
README generated with AI ✨