GenAI Document Q&A System — RAG Pipeline

Built by Sai Krishna Darla | Java Backend Engineer
Java 21 · Spring Boot 3.4 · Spring AI 1.1 · AWS Bedrock · PGVector · Docker

What is this project?

A production-style Retrieval Augmented Generation (RAG) system that answers questions about your own documents — with source attribution and near-zero hallucination compared to vanilla LLM calls.

Instead of asking an LLM to guess from its training data, this system:

Stores your documents as semantic vectors in a database
Finds the most relevant chunks when a question is asked
Sends only those chunks to the LLM as grounded context
Returns an answer with the source document cited

What is RAG?

A vanilla LLM answers from memory — it can confidently give wrong answers (hallucination). RAG fixes this by retrieving real content from your own documents before generating an answer.

Think of it like this:

Without RAG → LLM guesses the answer from training data
With RAG → LLM reads the relevant pages first, then answers

flowchart LR
    A([Your Documents]) --> B[Chunk + Embed]
    B --> C[(PGVector DB)]
    D([User Question]) --> E[Embed Question]
    E --> F[Similarity Search]
    C --> F
    F --> G[Build Prompt\nwith Context]
    G --> H[AWS Bedrock LLM]
    H --> I([Grounded Answer\n+ Source])

High Level Design

flowchart TD
    S3[AWS S3\nDocument Storage]
    API[Spring Boot RAG API\nSpring AI 1.1 · Java 21]
    PG[(PGVector\nVector Store)]
    TITAN[Amazon Titan\nEmbedding Model]
    BEDROCK[AWS Bedrock\nClaude / Titan LLM]
    USER[User / Client\nREST API]

    S3 -->|ingest docs| API
    USER -->|question| API
    API -->|embed| TITAN
    TITAN -->|store vectors| PG
    API -->|similarity search| PG
    PG -->|top-5 chunks| API
    API -->|prompt + context| BEDROCK
    BEDROCK -->|grounded answer| USER

Low Level Design — RAG Pipeline

Ingestion Flow

flowchart TD
    A[Read PDF / Text from S3\nSpring AI S3Resource]
    B[Split into chunks\n512 tokens · 64 overlap]
    C[Embed each chunk\nAmazon Titan → 1536-dim vector]
    D[Store in PGVector\nvector + metadata + source ref]

    A --> B --> C --> D

Query Flow

flowchart TD
    A[User question via REST POST]
    B[Embed question\nSame Titan model]
    C[Cosine similarity search\nPGVector top-K=5 chunks]
    D[Build prompt\nStuff chunks + question into template]
    E[AWS Bedrock LLM\nClaude / Titan]
    F[Return answer + source citations]

    A --> B --> C --> D --> E --> F

Tech Stack

Technology	Version	Role in this project
Java	21	Core language — virtual threads for concurrency
Spring Boot	3.4	Application framework — auto-config, DI, REST
Spring AI	1.1 GA	AI abstraction layer — unified interface for LLMs, embeddings, vector stores
AWS S3	—	Durable document storage — source of truth for raw files
Amazon Titan	Embed v1	Converts text chunks to 1536-dimensional float vectors
PGVector	PostgreSQL ext	Stores embeddings, performs cosine similarity search
AWS Bedrock	Claude/Titan	Managed LLM inference — generates grounded answers
Docker Compose	—	Spins up PGVector + all services locally
Testcontainers	—	Runs real Postgres+PGVector in Docker during integration tests
Maven	Multi-module	Separates ingestion, API, and shared model modules

Why Spring AI?

Spring AI is Java's equivalent of LangChain. It gives a single unified interface for embeddings, vector stores, and LLMs — so the underlying vendor can be swapped through config, not code.

// Same code works whether backend is Bedrock, OpenAI, or Ollama
EmbeddingModel embeddingModel;   // → Amazon Titan underneath
VectorStore vectorStore;          // → PGVector underneath  
ChatClient chatClient;            // → AWS Bedrock underneath

The QuestionAnswerAdvisor chains all three automatically into a full RAG pipeline:

String answer = chatClient.prompt()
    .advisors(new QuestionAnswerAdvisor(vectorStore))
    .user(question)
    .call()
    .content();

Why PGVector over Pinecone or Weaviate?

Runs as a PostgreSQL extension — same DB holds metadata AND vectors
No cross-service consistency issues
Sufficient performance for this scale
Cosine similarity via <=> operator with IVFFLAT indexing

Project Structure

genai-document-qa-rag/
├── data-ingestion/          # Reads S3, chunks, embeds, stores in PGVector
├── document-qa/             # REST API — receives questions, runs RAG chain
├── mcp/                     # MCP client integrations
├── buildSrc/                # Shared build config
└── docker-compose.yml       # PGVector + app services

Key Results

Reduced hallucination rate from ~40% to near-zero for indexed documents
Source-attributed answers — every response cites the S3 document it came from
Full infra via Docker Compose — single command local setup
Integration tested with Testcontainers — real DB behaviour in tests

Author

Sai Krishna Darla — Java Backend Engineer
LinkedIn · GitHub

Name		Name	Last commit message	Last commit date
Latest commit History 249 Commits
.github		.github
buildSrc		buildSrc
data-ingestion		data-ingestion
gradle/wrapper		gradle/wrapper
mcp/mcp-clients/mcp-brave		mcp/mcp-clients/mcp-brave
models		models
observability		observability
patterns		patterns
rag		rag
use-cases		use-cases
.gitignore		.gitignore
.sdkmanrc		.sdkmanrc
LICENSE		LICENSE
README.md		README.md
devbox.json		devbox.json
devbox.lock		devbox.lock
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GenAI Document Q&A System — RAG Pipeline

What is this project?

What is RAG?

High Level Design

Low Level Design — RAG Pipeline

Ingestion Flow

Query Flow

Tech Stack

Why Spring AI?

Why PGVector over Pinecone or Weaviate?

Project Structure

Key Results

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GenAI Document Q&A System — RAG Pipeline

What is this project?

What is RAG?

High Level Design

Low Level Design — RAG Pipeline

Ingestion Flow

Query Flow

Tech Stack

Why Spring AI?

Why PGVector over Pinecone or Weaviate?

Project Structure

Key Results

Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages