This project demonstrates a simple, end-to-end RAG pipeline using LangChain, Ollama, and FAISS to ask questions over a PDF document.
The pipeline:
- Loads a PDF
- Splits it into chunks
- Embeds the chunks
- Stores them in a vector database (FAISS)
- Retrieves relevant chunks for a question
- Uses an LLM to answer only based on the retrieved context
- LLM:
deepseek-r1:8b(via Ollama) - Embeddings: Ollama embeddings using the same model
You can easily swap to another Ollama-supported model.
.
├── duh.pdf # Input PDF document
├── rag_pdf.py # Main RAG pipeline script
├── README.md # Project documentation
- Python 3.9+
pip install langchain langchain-community langchain-text-splitters \
langchain-ollama faiss-cpu pypdfInstall Ollama and pull the model:
ollama pull deepseek-r1:8bMake sure Ollama is running:
ollama servefrom langchain_community.document_loaders import PyPDFLoader
loader = PyPDFLoader("duh.pdf")
pages = loader.load()from langchain_text_splitters import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(
chunk_size=1500,
chunk_overlap=100
)
chunks = splitter.split_documents(pages)from langchain_community.vectorstores import FAISS
from langchain_ollama import OllamaEmbeddings
embeddings = OllamaEmbeddings(model=MODEL)
vectorstore = FAISS.from_documents(chunks, embeddings)
retriever = vectorstore.as_retriever()from langchain_ollama import ChatOllama
model = ChatOllama(model=MODEL, temperature=0)The model is forced to answer only from context:
from langchain.prompts import PromptTemplate
template = """
You are an assistant that provides answers to questions based on
provided context.
If the answer is not in the context, reply: "I don't know".
Context:
{context}
Question:
{question}
"""
prompt = PromptTemplate.from_template(template)from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser
chain = (
{
"context": itemgetter("question") | retriever,
"question": itemgetter("question"),
}
| prompt
| model
| StrOutputParser()
)questions = [
"What can you get away with when you only have a small number of users?",
"What's the most common unscalable thing founders have to do at the start?",
]
for q in questions:
print(chain.invoke({"question": q}))- 📄 PDF-based knowledge grounding
- 🔍 Semantic retrieval with FAISS
- 🧠 LLM answers constrained to context (low hallucination)
- 🔌 Runs fully locally (no cloud APIs)
- Add metadata filtering (page number, section)
- Persist FAISS index to disk
- Use multi-PDF ingestion
- Add citations / source highlighting
- Wrap with Streamlit or Gradio UI
This project reduces hallucinations by:
- Strict context-based prompting
- Temperature = 0
- Retrieval before generation (RAG)
- Designed for experimentation and learning
- Suitable for research papers, startup docs, or technical PDFs
Avithal — Computer Vision & AI Engineer
⭐ If this helped you, consider starring the repo!