Loaders #22

spyrchat · 2025-10-14T12:17:44Z

This pull request introduces significant improvements to the agent pipeline, focusing on modularity, benchmarking, and flexibility. The main changes include the removal of the old agent graph, the addition of two new, more advanced agent graph variants (a refined RAG agent and a self-correcting Self-RAG agent), the introduction of a robust benchmarking logger node, and enhancements to the generator node for configurable prompting. It also updates documentation and environment variable management for better usability.

Agent Pipeline Refactoring and Enhancements:

Removed the legacy agent graph implementation in agent/graph.py in favor of more modular and extensible graph variants.
Added agent/graph_refined.py, which implements a multi-stage, linear RAG agent pipeline with explicit query analysis, retrieval, generation, memory updating, and benchmark logging. This graph supports configurable LLM providers and prompt styles.
Introduced agent/graph_self_rag.py, a Self-RAG agent graph with an iterative refinement loop, enabling answer verification and correction to reduce hallucinations.

Benchmarking and Logging:

Added agent/nodes/benchmark_logger.py, providing a node and utility for logging agent pipeline executions, saving outputs for benchmarking, and summarizing results. This is integrated into both new agent graphs.

Generator Node Improvements:

Enhanced the generator node in agent/nodes/generator.py to support multiple prompt styles ("strict", "conversational", "citations") selected via configuration, improving answer quality and flexibility. Also improved logging and prompt structure for technical accuracy. [1] [2]

Documentation and Configuration:

Added a new CLI_REFERENCE.md documenting all available CLI commands and flags for scripts, ensuring accurate and up-to-date developer reference.
Updated .env_example to focus on API keys relevant to LLM providers, removing unused or legacy environment variables.

Summary of Most Important Changes:

Agent Pipeline Refactoring:

Removed legacy agent graph (agent/graph.py) and replaced with two new, modular graph variants: agent/graph_refined.py (standard RAG pipeline) and agent/graph_self_rag.py (self-correcting, hallucination-resistant pipeline). [1] [2] [3]

Benchmarking and Logging:

Introduced a benchmark logger node (agent/nodes/benchmark_logger.py) for saving and summarizing pipeline execution data, integrated into both agent graphs.

Generator Node Improvements:

Refactored the generator node to support configurable prompt styles and improved prompt instructions for technical accuracy and flexibility. [1] [2]

Documentation:

Added a comprehensive CLI reference (CLI_REFERENCE.md) for all supported scripts and flags.

Configuration:

Updated .env_example to include only relevant API keys, removing obsolete variables.

- Remove all configuration merging logic for complete reproducibility - Implement AutoRAG multi-arm bandit hyperparameter optimization - Create self-contained benchmark scenarios (no dependencies) - Add comprehensive configuration validation - Update all benchmark runners to use isolated configs - Create activation function and gradient descent visualizations - Add ISSEL color scheme to all plots - Implement configuration audit system for experiment tracking - Fix import issues in benchmarks package - Add dense-only retrieval configuration examples

…embedded in database

…its and add chunking strategies for code-aware processing

…figuration handling - Removed hyperparameter_spaces.yml file as it is no longer needed. - Simplified the create_from_unified_config method in RetrievalPipelineFactory to enhance retrieval type detection and configuration merging. - Deleted unused HuggingFaceEmbedder class from embeddings.py. - Updated get_embedder function in factory.py to improve error handling and support for HuggingFace embeddings. - Modified ModernBaseRetriever to enforce required configuration parameters without defaults. - Enhanced QdrantDenseRetriever and QdrantHybridRetriever to require explicit embedding configurations and improved initialization logic. - Refactored result fusion methods in QdrantHybridRetriever to utilize alpha weighting for better control over dense and sparse result combinations. - Updated QdrantSparseRetriever to ensure strict configuration requirements and improved initialization.

…os; add per-query results export functionality

…tion; add chunk ID mapping utility functions

…ved retrieval performance; adjust top_k values and collection names. Fixed a bug where only the question title would be passed in the benchmark without the question body.

…lpha parameter for fusion methods, refactor confidence interval calculations, and improve CSV export functionality.

…der support; adjust alpha parameter, refactor fusion methods, and improve dataset configurations.

…M25Embedder and SpladeEmbedder, enhancing model initialization and embedding methods.

…ment abstract methods in BenchmarkAdapter, update SpladeEmbedder to use FastEmbed, and modify dataset configuration for improved model integration.

…sults. The problem was in the yml files as they had wrong collection for sparse search

… provider name to "sparse-splade" for clarity. Refactor experiment name in Experiment1Runner and enhance error handling in retrievers.

… metrics; enhance confidence interval calculation for single value cases.

…eatures - Deleted obsolete benchmark scenario files: bm25_baseline_full.yml, dense_retrieval_full.yml, hybrid_retrieval_full.yml, and quick_test.yml. - Created new benchmark scenario files for Experiment 1: bm25_baseline.yml, hybrid_bm25_bge_m3.yml, hybrid_splade_bge_m3.yml, and splade_baseline.yml. - Updated benchmarks/experiment1.py to streamline experiment execution and enhance result processing. - Introduced BenchmarkReportGenerator for improved report generation and scenario summaries. - Added BenchmarkResultsExporter to handle result exports in various formats. - Implemented BenchmarkStatisticalAnalyzer for comprehensive statistical analysis of benchmark results, including pairwise testing and Bonferroni correction. - Enhanced retrieval time calculation in benchmarks/benchmarks_runner.py. - Improved code organization and readability across multiple files.

…oading method; add usage guide for dataset adapters.

…rid search for hyperparameter tuning with F1@5 objective.

…ility functions

…rs and natural questions; clean up unused Stack Overflow dataset configurations; refactor contracts for improved clarity and maintainability.

- Updated hybrid Splade configurations to use top_k=10 and adjusted alpha values across multiple YAML files. - Introduced a new configuration file for Hybrid Splade optimal settings. - Modified the experiment runner to include the new optimal configuration and updated print statements for clarity. - Added a new script for optimizing the alpha parameter with a fixed k value, supporting both single metric and composite objective modes. - Enhanced results exporting functionality to dynamically include metrics in summary comparisons. - Adjusted normalization processes in retriever classes for consistency and performance.

- Add AdapterLoader for dynamic adapter instantiation from config - Update ingestion pipeline to load adapters from YAML config - Update benchmark system to support config-based adapter loading - Make adapter parameters optional in CLI (can read from config) - Add adapter field to all benchmark YAML configs (12 files) - Update StackOverflowBenchmarkAdapter signature for compatibility - Fix relative imports in benchmark modules - Add comprehensive documentation: - DYNAMIC_ADAPTERS.md - Complete guide for ingestion adapters - DYNAMIC_ADAPTERS_QUICKREF.md - Quick reference - DYNAMIC_ADAPTER_BENCHMARKS.md - Benchmark adapter guide - BENCHMARK_DYNAMIC_ADAPTER_FIX.md - Fix documentation - Add custom adapter example - Update contracts.py with missing RetrievalMetrics and EvaluationRun classes Benefits: - No code changes needed to add new adapters (just YAML config) - Unified dynamic loading for both ingestion and benchmarks - Better scalability and maintainability - Improved configuration as code approach

- Updated `hybrid_bge_splade_fixed_k10.yml` to optimize alpha values from "0.0:1.0:0.1" to "0.9:1.0:0.02". - Deleted `custom_adapter_example.py` as it is no longer needed. - Added new analysis notebook `experiment1_analysis.ipynb` for experiment 1. - Introduced multiple output files for experiment 1 plots, including PDFs and PNGs for various metrics. - Created `key_findings.txt` summarizing key results from experiment 1. - Added LaTeX table `table1_summary_results.tex` summarizing performance metrics of different methods.

- Updated PDF and PNG files for overall performance, precision at k, recall at k, F1 scores, precision-recall tradeoff, NDCG progression, latency analysis, and statistical significance. - All figures have been regenerated to reflect the latest experimental results.

- Created a new markdown file for stratification justification, detailing the rationale behind the stratification strategy used in experiments. - Added various analysis result images to the 2D grid results directory, including: - Fold analysis - Heatmap of composite scores - Heatmaps for individual metrics - Sensitivity analysis - 3D surface plot - Test performance - Comparison of top configurations

… main.py to ensure environment variables are loaded correctly

- Created README.md for scripts directory detailing available scripts, usage, and configuration options. - Added README.md for tests directory outlining test structure, running tests, and dependencies. - Included descriptions of individual test files and their functions for better understanding of the testing framework.

Updated minimum RAM requirement and removed community support section.

…peline - Updated AgentState schema in `schema.py` to enhance clarity and organization of attributes, including query analysis, routing decisions, and generation modes. - Modified `config.yml` to switch agent mode to "refined" and updated LLM provider to "ollama" with new model specifications. - Introduced `llm_factory.py` to streamline LLM instance creation based on configuration, supporting multiple providers. - Adjusted `main.py` to load configurations dynamically and select the appropriate agent graph based on the mode. - Removed outdated retrieval configuration guide and added a new README for retrieval configurations, detailing available options and usage. - Created new retrieval configuration files for dense retrieval with and without reranking, while removing obsolete examples. - Enhanced documentation for retrieval configurations, including performance comparisons and troubleshooting tips.

…mbedding and performance settings

- Simplified the main.py by removing the configuration loading logic and directly importing the refined graph. - Updated fast_dense_bge_m3.yml to streamline the retrieval pipeline configuration, removing unnecessary parameters and focusing on essential settings for speed. - Modified base_retriever.py to allow optional 'top_k' parameter with a default value, enhancing flexibility. - Changed dense_retriever.py and sparse_retriever.py to retrieve 'text' instead of 'page_content' from payloads, ensuring consistency in document creation.

…dd ground truth generation script for SOSUM dataset

… multi-provider support. Also fixed a bug where the sparse retriever would use the old qdrant API

- Implemented demo script (demo_graph_viz.py) for quick visualization of agent graphs in ASCII, Mermaid, and PNG formats. - Created visualization utility (visualize_graph.py) to handle different modes (standard, self-rag, both) and output formats. - Added error handling and output directory management for generated visualizations. - Removed empty scripts (demo_self_rag.py, visualize_cv_splits.py) as they were not utilized.

- Replaced StratifiedKFold with a simple train/test split using train_test_split for hyperparameter optimization. - Updated class and method documentation to reflect the new splitting strategy. - Simplified the create_cv_splits method to create_train_test_split with parameters for test size and minimum samples per stratum. - Enhanced output statistics to include detailed information about train and test distributions. - Removed unnecessary complexity related to multiple folds, focusing on a single stratified split.

- Removed old Qdrant configuration from .env_example and added API keys for OpenAI, Google, and Voyage. - Updated model name in llm_as_judge_eval.py to "gpt-5" and changed input/output paths for evaluation results. - Added new Jupyter notebook for LLM Judge analysis. - Included various plot images for analysis results in output directory. - Created a new analysis report in Markdown format summarizing LLM Judge results.

…deprecated scripts - Updated various experiment plots in the output directory, including overall performance, precision at k, recall at k, F1 scores, precision-recall tradeoff, NDCG progression, latency analysis, statistical significance, and comprehensive dashboard. - Modified the retrieval configuration in `fast_dense_bge_m3.yml` to implement a hybrid retrieval approach using BGE-M3 and SPLADE with Reciprocal Rank Fusion. - Added new visualizations for LLM judge analysis, including boxplots, category distributions, and score distributions. - Removed outdated scripts for graph visualization and self-RAG demo, streamlining the codebase

Removed instructions for running the retrieval demo.

Removed troubleshooting section and integration examples from README.

Removed references to experiment3.py and benchmark run files.

…commands

spyrchat added 30 commits September 23, 2025 04:42

feat: Remove data leakage issue because question and answer both got …

230506b

…embedded in database

feat: Add SOSUM dataset analysis tool with publication-quality plotting

729f22b

feat: Implement StratifiedRAGDatasetSplitter for balanced dataset spl…

1b118d6

…its and add chunking strategies for code-aware processing

feat: Update evaluation metrics and k-values across benchmark scenari…

7446e8a

…os; add per-query results export functionality

feat: Enhance StackOverflowBenchmarkAdapter to support Qdrant integra…

17c07b9

…tion; add chunk ID mapping utility functions

feat: Update benchmark scenarios and dataset configurations for impro…

304b0eb

…ved retrieval performance; adjust top_k values and collection names. Fixed a bug where only the question title would be passed in the benchmark without the question body.

feat: Enhance retrieval configurations and metrics handling; update a…

a93b024

…lpha parameter for fusion methods, refactor confidence interval calculations, and improve CSV export functionality.

feat: Update fusion method configurations and enhance embedding provi…

a67c9ab

…der support; adjust alpha parameter, refactor fusion methods, and improve dataset configurations.

feat: Refactor sparse embedder classes; replace SparseEmbedder with B…

103b603

…M25Embedder and SpladeEmbedder, enhancing model initialization and embedding methods.

feat: Refactor benchmark adapter and enhance embedding classes; imple…

c2bab73

…ment abstract methods in BenchmarkAdapter, update SpladeEmbedder to use FastEmbed, and modify dataset configuration for improved model integration.

To be fixed in hybrid retrieval sparse score is 0

1701159

Fixed an issue where the thybrid retriever would not return sparse re…

ecf1132

…sults. The problem was in the yml files as they had wrong collection for sparse search

feat: Update YAML configuration for Splade embedding provider; change…

f26f034

… provider name to "sparse-splade" for clarity. Refactor experiment name in Experiment1Runner and enhance error handling in retrievers.

feat: Add retrieval time statistics and summary printing to benchmark…

19d2a57

… metrics; enhance confidence interval calculation for single value cases.

feat: Enhance benchmark configuration validation and update dataset l…

57da761

…oading method; add usage guide for dataset adapters.

feat: Add hybrid Splade + BGE-M3 benchmark configuration; implement g…

dc785b6

…rid search for hyperparameter tuning with F1@5 objective.

refactor: Clean up import statements and improve error handling in ut…

267a3bc

…ility functions

Remove deprecated dataset adapters and configurations for energy pape…

a2ddfd5

…rs and natural questions; clean up unused Stack Overflow dataset configurations; refactor contracts for improved clarity and maintainability.

Update experiment plots with new performance metrics and visualizations

93137a6

Refactor config.yml to streamline agent retrieval settings and update…

f3c2549

… main.py to ensure environment variables are loaded correctly

Revise system requirements and remove support details

cde848c

Updated minimum RAM requirement and removed community support section.

spyrchat added 21 commits October 8, 2025 13:27

Enhance fast dense retrieval configuration for BGE-M3 with detailed e…

19378f4

…mbedding and performance settings

Merge branch 'loaders' of github.com:spyrchat/Thesis into loaders

285ee4a

Enhance generator functionality with configurable prompt styles and a…

a3f260f

…dd ground truth generation script for SOSUM dataset

Add LLM-as-a-Judge evaluation script and implement LLMJudge class for…

bb353ca

… multi-provider support. Also fixed a bug where the sparse retriever would use the old qdrant API

Refactor code structure for improved readability and maintainability

5f64a0e

Refactor code structure for improved readability and maintainability

0bd3a56

fix: Mark subproject as dirty to indicate uncommitted changes

572ebbe

feat: Add integration tests for Self-RAG graph with detailed logging

5475359

Remove retrieval demo instructions from README

f4169c9

Removed instructions for running the retrieval demo.

Update readme.md

c61b6ab

Revise README to remove troubleshooting and integration details

49314b5

Removed troubleshooting section and integration examples from README.

Update README to remove obsolete benchmark scripts

7cee06e

Removed references to experiment3.py and benchmark run files.

Merge branch 'loaders' of github.com:spyrchat/Thesis into loaders

9fb1e22

Enhance CLI with config file support for ingest, status, and cleanup …

b2c647f

…commands

Remove test coverage section and related notes from README

6466bbb

spyrchat closed this Oct 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Loaders #22

Loaders #22

Uh oh!

spyrchat commented Oct 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Loaders #22

Loaders #22

Uh oh!

Conversation

spyrchat commented Oct 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants