Benchmark #18

spyrchat · 2025-09-07T23:23:21Z

This pull request introduces a modular, production-ready Retrieval-Augmented Generation (RAG) system built around a configurable LangGraph agent architecture. The changes add a flexible, YAML-configurable retrieval pipeline, robust agent node implementations, and comprehensive documentation. The new system supports advanced retrieval strategies, reranking, filtering, and seamless agent workflows, making it highly extensible and suitable for production use.

The most important changes are:

Agent System Architecture & Core Nodes:

Introduced a new LangGraph-based agent in agent/graph.py, wiring together modular nodes for query interpretation, retrieval, answer generation, and memory updating, all configurable via YAML.
Implemented modular agent nodes:
- query_interpreter for intent classification and routing
- retriever supporting configurable pipelines and legacy compatibility
- generator for answer synthesis using LLMs
- memory_updater for chat history management
Defined a comprehensive AgentState schema for agent state management, supporting enhanced retrieval metadata and error handling.

Configurable Retrieval Pipeline:

Added a highly flexible retriever node (make_configurable_retriever) that loads pipeline configurations from YAML, supports multiple retrieval strategies (dense, sparse, hybrid), reranking, filtering, and rich metadata output.

Productionization & Deployment:

Added a Dockerfile for containerized, production-ready deployment, including system dependencies and Python requirements installation.

Documentation & Developer Experience:

Replaced the README.md with a detailed, user- and developer-focused guide covering architecture, configuration, extension, testing, and migration from legacy code.

Agent Architecture & Core Logic

Added agent/graph.py to define the main LangGraph agent, connecting modular nodes for interpretation, retrieval, generation, and memory, all configurable via YAML.
Implemented agent nodes: query interpreter (LLM-based routing), configurable retriever, generator (LLM answer synthesis), and memory updater for chat history. [1] [2] [3] [4]
Defined a robust AgentState schema supporting enhanced retrieval, error handling, and extensibility.

Retrieval Pipeline Flexibility

Introduced a configurable retriever node supporting YAML-based pipeline configuration, advanced reranking, filtering, and rich document metadata, with legacy compatibility for older retrieval strategies.

Deployment & Operations

Added a Dockerfile for streamlined, production-ready containerization of the system.

Documentation & Usability

Overhauled README.md with comprehensive instructions, architecture diagrams, extension guides, configuration examples, and migration paths from legacy code.

…it; create hybrid-retriever.py file; update requirements.txt for new dependencies

…nit_collection method and enhance as_langchain_vectorstore for hybrid retrieval; improve BaseRetriever documentation and remove hybrid-retriever.py file.

…ctor run method to return dense and sparse vectors; update test script to integrate new pipeline functionality and add metadata handling for documents.

… logging for collection creation and document insertion; refactor insert_documents method for improved clarity. Update EmbeddingPipeline to use new splitter method. Modify SparseEmbedder to default to CUDA. Add hybrid_retriever.py file.

…nitialization and embedding retrieval; improve logging and document preparation in test_embedding_pipeline.

…ctor_size parameter for improved clarity.

…ds to retrieve client and collection name. Update EmbeddingPipeline and SparseEmbedder to use Embeddings instead of BaseEmbedder. Add QdrantHybridRetriever for hybrid retrieval functionality and update test scripts accordingly.

… docstrings for clarity.

… hybrid-retriever

Hybrid retriever is functional and ready to deploy to development branch

… clarity.

Hybrid retriever

…gration

…er; add text processing pipeline for PDF documents

…s for clarity

…; implement metadata handling and enrich documents for upload

…rocessing test script

…ctory in test script

…date requirements for PyMuPDF

…sing test script

…g, and memory updating; add retriever routing logic and logging

…ment generator and retriever nodes

… requirements.txt with dependency version upgrades

…script - Deleted the following test files: - test_full_ingestion.py - test_modular_pipeline.py - run_all_tests.py - test_adapter_fix.py - test_agent_retrieval.py - test_retriever_direct.py - test_streamlined_agent.py - Added a new test file: test_local_setup.py - This script checks prerequisites and runs progressive tests for the pipeline.

…t.txt and updating pipeline tests to use requirements-minimal.txt

…ation structure for improved clarity and maintainability. - Deleted SYSTEM_EXTENSION_GUIDE.md, UNIFIED_CONFIG.md, agent_retrieval_upgrade_summary.md, config_reorganization_summary.md, integration_testing_setup.md, and sql_removal_summary.md. - Simplified agent graph by removing SQL-related nodes and dependencies. - Consolidated configuration files into a unified structure, enhancing usability and reducing clutter.

…s for LangChain and dotenv

…rant in requirements-minimal.txt

…rements documentation

… setup script

…iever configuration

…tion

…nd connectivity

… for clarity

…and improve connectivity checks

…ication

…ling

…onfigurations

…e workflow

…ne tests

…vity handling

…nt in end-to-end tests

…e workflow

spyrchat · 2025-09-08T00:15:20Z

This pull request introduces a modular, production-ready Retrieval-Augmented Generation (RAG) system with a LangGraph agent architecture. It adds a configurable agent graph, modular pipeline components, robust testing and CI/CD setup, and comprehensive documentation for users and contributors. The most important changes are grouped below:

Agent Graph and Modular Pipeline Architecture

Introduced agent/graph.py, which builds a configurable LangGraph agent using modular nodes for query interpretation, retrieval, generation, and memory updating, all configurable via YAML. This enables flexible and extensible RAG workflows.
Added new node implementations: query_interpreter, retriever, generator, and memory_updater, each as a separate module under agent/nodes/, supporting clean separation of concerns and easy extension. [1] [2] [3]

Testing and CI/CD

Added a new GitHub Actions workflow .github/workflows/pipeline-tests.yml to automate minimal, integration, and security/config validation tests, including Qdrant service integration and configuration checks for robustness.

Deployment and Packaging

Introduced a Dockerfile using a slim Python 3.11 base image, installing dependencies and copying the full source, to support easy containerized deployment of the system.

Documentation and Developer Experience

Replaced the README.md with a detailed overview of the architecture, features, configuration options, usage instructions, extension guidelines, and project structure, making it easier for users and contributors to understand and work with the system.

spyrchat added 30 commits April 26, 2025 17:00

Add SparseEmbedder class and update get_embedder function to support …

d0d0668

…it; create hybrid-retriever.py file; update requirements.txt for new dependencies

Refactor QdrantVectorDB to support dense and sparse vectors; update i…

7c6bf9c

…nit_collection method and enhance as_langchain_vectorstore for hybrid retrieval; improve BaseRetriever documentation and remove hybrid-retriever.py file.

Enhance EmbeddingPipeline to support optional sparse embeddings; refa…

f283739

…ctor run method to return dense and sparse vectors; update test script to integrate new pipeline functionality and add metadata handling for documents.

Refactor QdrantVectorDB and embedding factory to enhance collection i…

6cbb9e4

…nitialization and embedding retrieval; improve logging and document preparation in test_embedding_pipeline.

Refactor init_collection method in QdrantVectorDB to remove sparse_ve…

c02e61a

…ctor_size parameter for improved clarity.

Refactor BaseVectorDB to specify return types for methods and enhance…

14cbf21

… docstrings for clarity.

Merge branch 'development' of https://github.com/spyrchat/Thesis into…

862134b

… hybrid-retriever

Merge pull request #1 from spyrchat/hybrid-retriever

7f75ff9

Hybrid retriever is functional and ready to deploy to development branch

Remove BaseEmbedder inheritance from HuggingFaceEmbedder for improved…

b5d9c2b

… clarity.

Remove BaseEmbedder inheritance from TitanEmbedder for improved clarity.

4b51eda

Merge pull request #2 from spyrchat/hybrid-retriever

9cca60c

Hybrid retriever

Add PostgresController and connection test script for PostgreSQL inte…

df599be

…gration

Implement image and table asset insertion methods in PostgresControll…

9045ddf

…er; add text processing pipeline for PDF documents

Add table extraction and SQL uploading functionality; refactor import…

31a2a7e

…s for clarity

Add PDF processing, table extraction, and text chunking functionality…

bdb9037

…; implement metadata handling and enrich documents for upload

Enhance embedding pipeline with dynamic embedding strategy; add PDF p…

bcc92da

…rocessing test script

Refactor import statements to use relative paths; update sandbox dire…

d400d85

…ctory in test script

Enhance Qdrant document insertion with error handling and logging; up…

40d43c9

…date requirements for PyMuPDF

Add table extraction functionality with logging; implement PDF proces…

d6c07b5

…sing test script

Implement modular RAG pipeline with query interpretation, SQL plannin…

f75f74f

…g, and memory updating; add retriever routing logic and logging

Add Dockerfile, docker-compose.yml, and main application logic; imple…

53d213b

…ment generator and retriever nodes

Refactor QdrantVectorDB: remove unused import and add spacing; update…

985440e

… requirements.txt with dependency version upgrades

Updated requirements.txt

20a99bd

full pipeline is functional

8c08f87

Added logging

b6feff5

Added docstrings for clarity

4187574

Added Docstrings

fa53084

added config.yml

346f0d6

spyrchat added 27 commits August 31, 2025 00:54

chore: Update Python version to 3.13 in pipeline tests

10b6620

chore: Update testing dependencies and Python version in CI workflows

056f007

refactor: Simplify dependency management by removing requirements-tes…

7323f4b

…t.txt and updating pipeline tests to use requirements-minimal.txt

chore: Update requirements-minimal.txt to include missing dependencie…

b39c51d

…s for LangChain and dotenv

chore: Add missing dependencies for boto3, botocore, and langchain-qd…

e1beb8b

…rant in requirements-minimal.txt

refactor: Enhance Qdrant connectivity tests and remove outdated requi…

149ab30

…rements documentation

chore: Remove outdated GitHub Actions CI configuration and local test…

63d29af

… setup script

chore: Remove outdated example scripts and sample data files for retr…

18903d8

…iever configuration

Fix Google dependencies conflict in requirements.txt

3393ec5

fix: Remove unnecessary blank line in insert_documents method

9415e07

fix: Improve .env loading and add default values for Qdrant configura…

2748761

…tion

fix: Update Qdrant service configuration for improved health checks a…

8d6d16f

…nd connectivity

fix: Improve Qdrant health check commands and update logging messages…

8163734

… for clarity

fix: Update Qdrant health check commands to use the correct endpoint …

44baa40

…and improve connectivity checks

fix: Update Qdrant health check commands for improved readiness verif…

c128036

…ication

fix: Enhance Qdrant readiness check with retry logic and timeout hand…

0068770

…ling

fix: Update Qdrant readiness check to use the correct endpoint

b2f8882

fix: Update Qdrant health check endpoints and enhance pipeline test c…

eebb5d6

…onfigurations

refactor: Remove redundant pipeline test configurations and streamlin…

4fe9c6b

…e workflow

fix: Simplify commands and enhance output messages in pipeline tests

c3abc86

changed check in git workflows from ok to HTTP 200

614692e

fix: Clean up whitespace and improve readability in end-to-end pipeli…

fae42db

…ne tests

fix: Update end-to-end test configuration and enhance Qdrant connecti…

2bd6b2e

…vity handling

fix: Enhance Qdrant configuration handling and improve error manageme…

5809813

…nt in end-to-end tests

fix: Remove end-to-end tests from pipeline configuration to streamlin…

3715f8d

…e workflow

spyrchat merged commit 3ef36b0 into main Sep 8, 2025
3 checks passed

spyrchat deleted the benchmark branch September 8, 2025 00:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Benchmark #18

Benchmark #18

Uh oh!

spyrchat commented Sep 7, 2025

Uh oh!

spyrchat commented Sep 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Benchmark #18

Benchmark #18

Uh oh!

Conversation

spyrchat commented Sep 7, 2025

Uh oh!

spyrchat commented Sep 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants