-
Notifications
You must be signed in to change notification settings - Fork 0
Benchmark #18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…it; create hybrid-retriever.py file; update requirements.txt for new dependencies
…nit_collection method and enhance as_langchain_vectorstore for hybrid retrieval; improve BaseRetriever documentation and remove hybrid-retriever.py file.
…ctor run method to return dense and sparse vectors; update test script to integrate new pipeline functionality and add metadata handling for documents.
… logging for collection creation and document insertion; refactor insert_documents method for improved clarity. Update EmbeddingPipeline to use new splitter method. Modify SparseEmbedder to default to CUDA. Add hybrid_retriever.py file.
…nitialization and embedding retrieval; improve logging and document preparation in test_embedding_pipeline.
…ctor_size parameter for improved clarity.
…ds to retrieve client and collection name. Update EmbeddingPipeline and SparseEmbedder to use Embeddings instead of BaseEmbedder. Add QdrantHybridRetriever for hybrid retrieval functionality and update test scripts accordingly.
… docstrings for clarity.
… hybrid-retriever
Hybrid retriever is functional and ready to deploy to development branch
Hybrid retriever
…er; add text processing pipeline for PDF documents
…; implement metadata handling and enrich documents for upload
…rocessing test script
…ctory in test script
…date requirements for PyMuPDF
…g, and memory updating; add retriever routing logic and logging
…ment generator and retriever nodes
… requirements.txt with dependency version upgrades
…script - Deleted the following test files: - test_full_ingestion.py - test_modular_pipeline.py - run_all_tests.py - test_adapter_fix.py - test_agent_retrieval.py - test_retriever_direct.py - test_streamlined_agent.py - Added a new test file: test_local_setup.py - This script checks prerequisites and runs progressive tests for the pipeline.
…t.txt and updating pipeline tests to use requirements-minimal.txt
…ation structure for improved clarity and maintainability. - Deleted SYSTEM_EXTENSION_GUIDE.md, UNIFIED_CONFIG.md, agent_retrieval_upgrade_summary.md, config_reorganization_summary.md, integration_testing_setup.md, and sql_removal_summary.md. - Simplified agent graph by removing SQL-related nodes and dependencies. - Consolidated configuration files into a unified structure, enhancing usability and reducing clutter.
…s for LangChain and dotenv
…rant in requirements-minimal.txt
…rements documentation
…iever configuration
…and improve connectivity checks
…nt in end-to-end tests
|
This pull request introduces a modular, production-ready Retrieval-Augmented Generation (RAG) system with a LangGraph agent architecture. It adds a configurable agent graph, modular pipeline components, robust testing and CI/CD setup, and comprehensive documentation for users and contributors. The most important changes are grouped below: Agent Graph and Modular Pipeline Architecture
Testing and CI/CD
Deployment and Packaging
Documentation and Developer Experience
|
This pull request introduces a modular, production-ready Retrieval-Augmented Generation (RAG) system built around a configurable LangGraph agent architecture. The changes add a flexible, YAML-configurable retrieval pipeline, robust agent node implementations, and comprehensive documentation. The new system supports advanced retrieval strategies, reranking, filtering, and seamless agent workflows, making it highly extensible and suitable for production use.
The most important changes are:
Agent System Architecture & Core Nodes:
agent/graph.py, wiring together modular nodes for query interpretation, retrieval, answer generation, and memory updating, all configurable via YAML.query_interpreterfor intent classification and routingretrieversupporting configurable pipelines and legacy compatibilitygeneratorfor answer synthesis using LLMsmemory_updaterfor chat history managementAgentStateschema for agent state management, supporting enhanced retrieval metadata and error handling.Configurable Retrieval Pipeline:
make_configurable_retriever) that loads pipeline configurations from YAML, supports multiple retrieval strategies (dense, sparse, hybrid), reranking, filtering, and rich metadata output.Productionization & Deployment:
Dockerfilefor containerized, production-ready deployment, including system dependencies and Python requirements installation.Documentation & Developer Experience:
README.mdwith a detailed, user- and developer-focused guide covering architecture, configuration, extension, testing, and migration from legacy code.Agent Architecture & Core Logic
agent/graph.pyto define the main LangGraph agent, connecting modular nodes for interpretation, retrieval, generation, and memory, all configurable via YAML.AgentStateschema supporting enhanced retrieval, error handling, and extensibility.Retrieval Pipeline Flexibility
Deployment & Operations
Dockerfilefor streamlined, production-ready containerization of the system.Documentation & Usability
README.mdwith comprehensive instructions, architecture diagrams, extension guides, configuration examples, and migration paths from legacy code.