Skip to content

parity-byte/pdf-rag-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📚 PDF RAG Pipeline

🔴 Live Demo: pdf-rag-pipeline.streamlit.app

Upload PDFs, chunk them, index with FAISS, and query with Llama 3.3 via Groq — all through a Streamlit chat interface.

Python LangChain FAISS Streamlit Groq


What it does

Drag-and-drop any PDF (annual reports, research papers, contracts, whatever) and ask questions against it. The pipeline:

  1. Extracts text from uploaded PDFs via PyPDF2
  2. Chunks the text with LangChain's RecursiveCharacterTextSplitter (10k chars, 1k overlap)
  3. Embeds chunks using sentence-transformers/all-MiniLM-L6-v2 (runs locally, no API needed)
  4. Indexes into a FAISS vector store for similarity search
  5. Queries the top-k similar chunks against Llama 3.3 70B on Groq for fast inference

The prompt template is tuned for financial document analysis (annual reports, related-party transactions, KMP remuneration) but works on any document type.

How to run

# clone + setup
git clone https://github.com/parity-byte/pdf-rag-pipeline.git
cd pdf-rag-pipeline

# install deps (pick one)
uv sync          # if you use uv
pip install -r requirements.txt  # otherwise

# set your Groq API key
cp .env.example .env
# edit .env and add your key (free at console.groq.com)

# run
streamlit run app.py

How it works

graph TD
    subgraph Data Ingestion
        A[PDF Document] -->|PyPDF2| B(Text Extraction)
        B --> C{RecursiveCharacter\nTextSplitter}
        C -->|10k chunks, 1k overlap| D[HuggingFace:\nall-MiniLM-L6-v2]
        D -->|Embeddings| E[(FAISS Vector Index)]
    end

    subgraph Query Execution
        F([User Query]) --> G[HuggingFace\nEmbeddings]
        G -->|similarity_search| E
        E -->|Top-k Docs| H{LangChain QA Chain}
        F --> H
        H -->|Prompt + Context| L[Groq: Llama 3.3 70B]
        L --> I([Final Answer])
    end
Loading

The embeddings run entirely locally (no API call). Only the final LLM inference hits Groq's API, which is free-tier friendly (~30 req/min).

Tech decisions

  • FAISS over Pinecone/Qdrant — everything runs locally, no cloud vector DB signup needed. Good for demos and small-to-medium document sets.
  • Groq over OpenAI — Llama 3.3 70B on Groq is free, fast (>500 tok/s), and avoids vendor lock-in.
  • HuggingFace embeddings over Google/OpenAIall-MiniLM-L6-v2 runs on CPU, zero API cost, and the quality is solid for retrieval.
  • Streamlit — fastest way to get a chat UI running. The custom HTML/CSS gives it a proper chat-bubble look instead of default Streamlit widgets.

README generated with AI ✨

About

RAG pipeline for querying PDF documents — LangChain + FAISS + Groq (Llama 3.3 70B) + Streamlit

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages