Skip to content

Dsk978/genai-document-qa-rag

Repository files navigation

GenAI Document Q&A System — RAG Pipeline

Built by Sai Krishna Darla | Java Backend Engineer
Java 21 · Spring Boot 3.4 · Spring AI 1.1 · AWS Bedrock · PGVector · Docker


What is this project?

A production-style Retrieval Augmented Generation (RAG) system that answers questions about your own documents — with source attribution and near-zero hallucination compared to vanilla LLM calls.

Instead of asking an LLM to guess from its training data, this system:

  1. Stores your documents as semantic vectors in a database
  2. Finds the most relevant chunks when a question is asked
  3. Sends only those chunks to the LLM as grounded context
  4. Returns an answer with the source document cited

What is RAG?

A vanilla LLM answers from memory — it can confidently give wrong answers (hallucination). RAG fixes this by retrieving real content from your own documents before generating an answer.

Think of it like this:

  • Without RAG → LLM guesses the answer from training data
  • With RAG → LLM reads the relevant pages first, then answers
flowchart LR
    A([Your Documents]) --> B[Chunk + Embed]
    B --> C[(PGVector DB)]
    D([User Question]) --> E[Embed Question]
    E --> F[Similarity Search]
    C --> F
    F --> G[Build Prompt\nwith Context]
    G --> H[AWS Bedrock LLM]
    H --> I([Grounded Answer\n+ Source])
Loading

High Level Design

flowchart TD
    S3[AWS S3\nDocument Storage]
    API[Spring Boot RAG API\nSpring AI 1.1 · Java 21]
    PG[(PGVector\nVector Store)]
    TITAN[Amazon Titan\nEmbedding Model]
    BEDROCK[AWS Bedrock\nClaude / Titan LLM]
    USER[User / Client\nREST API]

    S3 -->|ingest docs| API
    USER -->|question| API
    API -->|embed| TITAN
    TITAN -->|store vectors| PG
    API -->|similarity search| PG
    PG -->|top-5 chunks| API
    API -->|prompt + context| BEDROCK
    BEDROCK -->|grounded answer| USER
Loading

Low Level Design — RAG Pipeline

Ingestion Flow

flowchart TD
    A[Read PDF / Text from S3\nSpring AI S3Resource]
    B[Split into chunks\n512 tokens · 64 overlap]
    C[Embed each chunk\nAmazon Titan → 1536-dim vector]
    D[Store in PGVector\nvector + metadata + source ref]

    A --> B --> C --> D
Loading

Query Flow

flowchart TD
    A[User question via REST POST]
    B[Embed question\nSame Titan model]
    C[Cosine similarity search\nPGVector top-K=5 chunks]
    D[Build prompt\nStuff chunks + question into template]
    E[AWS Bedrock LLM\nClaude / Titan]
    F[Return answer + source citations]

    A --> B --> C --> D --> E --> F
Loading

Tech Stack

Technology Version Role in this project
Java 21 Core language — virtual threads for concurrency
Spring Boot 3.4 Application framework — auto-config, DI, REST
Spring AI 1.1 GA AI abstraction layer — unified interface for LLMs, embeddings, vector stores
AWS S3 Durable document storage — source of truth for raw files
Amazon Titan Embed v1 Converts text chunks to 1536-dimensional float vectors
PGVector PostgreSQL ext Stores embeddings, performs cosine similarity search
AWS Bedrock Claude/Titan Managed LLM inference — generates grounded answers
Docker Compose Spins up PGVector + all services locally
Testcontainers Runs real Postgres+PGVector in Docker during integration tests
Maven Multi-module Separates ingestion, API, and shared model modules

Why Spring AI?

Spring AI is Java's equivalent of LangChain. It gives a single unified interface for embeddings, vector stores, and LLMs — so the underlying vendor can be swapped through config, not code.

// Same code works whether backend is Bedrock, OpenAI, or Ollama
EmbeddingModel embeddingModel;   // → Amazon Titan underneath
VectorStore vectorStore;          // → PGVector underneath  
ChatClient chatClient;            // → AWS Bedrock underneath

The QuestionAnswerAdvisor chains all three automatically into a full RAG pipeline:

String answer = chatClient.prompt()
    .advisors(new QuestionAnswerAdvisor(vectorStore))
    .user(question)
    .call()
    .content();

Why PGVector over Pinecone or Weaviate?

  • Runs as a PostgreSQL extension — same DB holds metadata AND vectors
  • No cross-service consistency issues
  • Sufficient performance for this scale
  • Cosine similarity via <=> operator with IVFFLAT indexing

Project Structure

genai-document-qa-rag/
├── data-ingestion/          # Reads S3, chunks, embeds, stores in PGVector
├── document-qa/             # REST API — receives questions, runs RAG chain
├── mcp/                     # MCP client integrations
├── buildSrc/                # Shared build config
└── docker-compose.yml       # PGVector + app services

Key Results

  • Reduced hallucination rate from ~40% to near-zero for indexed documents
  • Source-attributed answers — every response cites the S3 document it came from
  • Full infra via Docker Compose — single command local setup
  • Integration tested with Testcontainers — real DB behaviour in tests

Author

Sai Krishna Darla — Java Backend Engineer
LinkedIn · GitHub

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors