📚 PDF RAG Pipeline

🔴 Live Demo: pdf-rag-pipeline.streamlit.app

Upload PDFs, chunk them, index with FAISS, and query with Llama 3.3 via Groq — all through a Streamlit chat interface.

What it does

Drag-and-drop any PDF (annual reports, research papers, contracts, whatever) and ask questions against it. The pipeline:

Extracts text from uploaded PDFs via PyPDF2
Chunks the text with LangChain's RecursiveCharacterTextSplitter (10k chars, 1k overlap)
Embeds chunks using sentence-transformers/all-MiniLM-L6-v2 (runs locally, no API needed)
Indexes into a FAISS vector store for similarity search
Queries the top-k similar chunks against Llama 3.3 70B on Groq for fast inference

The prompt template is tuned for financial document analysis (annual reports, related-party transactions, KMP remuneration) but works on any document type.

How to run

# clone + setup
git clone https://github.com/parity-byte/pdf-rag-pipeline.git
cd pdf-rag-pipeline

# install deps (pick one)
uv sync          # if you use uv
pip install -r requirements.txt  # otherwise

# set your Groq API key
cp .env.example .env
# edit .env and add your key (free at console.groq.com)

# run
streamlit run app.py

How it works

graph TD
    subgraph Data Ingestion
        A[PDF Document] -->|PyPDF2| B(Text Extraction)
        B --> C{RecursiveCharacter\nTextSplitter}
        C -->|10k chunks, 1k overlap| D[HuggingFace:\nall-MiniLM-L6-v2]
        D -->|Embeddings| E[(FAISS Vector Index)]
    end

    subgraph Query Execution
        F([User Query]) --> G[HuggingFace\nEmbeddings]
        G -->|similarity_search| E
        E -->|Top-k Docs| H{LangChain QA Chain}
        F --> H
        H -->|Prompt + Context| L[Groq: Llama 3.3 70B]
        L --> I([Final Answer])
    end

The embeddings run entirely locally (no API call). Only the final LLM inference hits Groq's API, which is free-tier friendly (~30 req/min).

Tech decisions

FAISS over Pinecone/Qdrant — everything runs locally, no cloud vector DB signup needed. Good for demos and small-to-medium document sets.
Groq over OpenAI — Llama 3.3 70B on Groq is free, fast (>500 tok/s), and avoids vendor lock-in.
HuggingFace embeddings over Google/OpenAI — all-MiniLM-L6-v2 runs on CPU, zero API cost, and the quality is solid for retrieval.
Streamlit — fastest way to get a chat UI running. The custom HTML/CSS gives it a proper chat-bubble look instead of default Streamlit widgets.

_{README generated with AI ✨}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.streamlit		.streamlit
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
app.py		app.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📚 PDF RAG Pipeline

What it does

How to run

How it works

Tech decisions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📚 PDF RAG Pipeline

What it does

How to run

How it works

Tech decisions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages