diff --git a/README.md b/README.md
index babb84d..9cc6171 100644
--- a/README.md
+++ b/README.md
@@ -1,24 +1,25 @@
-# Advanced RAG Retrieval System with LangGraph Agent
+# Advanced RAG System with LangGraph Agent & Benchmarking
 
-A production-ready, modular RAG (Retrieval-Augmented Generation) system with configurable pipelines and LangGraph agent integration.
+A production-ready, configurable RAG (Retrieval-Augmented Generation) system featuring LangGraph agent workflows, modular retrieval pipelines, and comprehensive benchmarking capabilities.
 
 ## Key Features
 
-- **YAML-Configurable Pipelines**: Switch retrieval strategies without code changes
-- **LangGraph Agent Integration**: Seamless agent workflows with rich metadata
-- **Modular Components**: Easily extensible rerankers, filters, and retrievers
-- **Multiple Retrieval Methods**: Dense, sparse, and hybrid retrieval
-- **Production Ready**: Robust error handling, logging, and monitoring
-- **A/B Testing Support**: Compare configurations easily
-- **Rich Metadata**: Access scores, methods, and quality metrics
+- **🤖 LangGraph Agent**: Intelligent agent workflows with configurable retrieval
+- **⚙️ YAML-Configurable Pipelines**: Switch retrieval strategies without code changes  
+- **🔄 Hybrid Retrieval**: Dense, sparse, and hybrid retrieval methods with RRF fusion
+- **🎯 Advanced Reranking**: CrossEncoder, BGE, and multi-stage reranking
+- **📊 Comprehensive Benchmarking**: Built-in evaluation framework with multiple metrics
+- **🗄️ Vector Database**: Qdrant integration with optimized indexing
+- **🔧 Modular Architecture**: Easily extensible components and filters
+- **📈 Performance Monitoring**: Rich metadata, logging, and health checks
 
 ## Architecture Overview
 
 ```
-┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
-│   LangGraph     │────│  Configurable    │────│   Retrieval     │
-│     Agent       │    │  Retriever Agent │    │   Pipeline      │
-└─────────────────┘    └──────────────────┘    └─────────────────┘
+┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐    ┌─────────────────┐
+│   LangGraph     │────│  Configurable    │────│   Retrieval     │────│   Benchmarking  │
+│     Agent       │    │  Retriever Agent │    │   Pipeline      │    │    Framework    │
+└─────────────────┘    └──────────────────┘    └─────────────────┘    └─────────────────┘
                                                         │
                        ┌────────────────────────────────┼────────────────────────────────┐
                        │                                │                                │
@@ -28,173 +29,348 @@ A production-ready, modular RAG (Retrieval-Augmented Generation) system with con
                  │• Dense    │                 │• CrossEncoder  │                │• Score    │
                  │• Sparse   │                 │• BGE Reranker  │                │• Content  │
                  │• Hybrid   │                 │• Multi-stage   │                │• Custom   │
+                 │• RRF      │                 │• Adaptive      │                │• Metadata │
                  └───────────┘                 └────────────────┘                └───────────┘
 ```
 
 ## Quick Start
 
-### 1. Install Dependencies
+### 1. Environment Setup
 
 ```bash
+# Install dependencies
 pip install -r requirements.txt
+
+# Configure environment variables
+cp .env.example .env
+# Edit .env with your API keys:
+# GOOGLE_API_KEY=your_google_api_key
+# OPENAI_API_KEY=your_openai_api_key
+# QDRANT_HOST=localhost
+# QDRANT_PORT=6333
 ```
 
-### 2. Configure Environment
+### 2. Start Vector Database
 
 ```bash
-# Copy example config
-cp config.yml.example config.yml
+# Using Docker Compose (recommended)
+docker-compose up -d qdrant
 
-# Set up your API keys and database connections in config.yml
+# Or run Qdrant directly
+docker run -p 6333:6333 qdrant/qdrant
 ```
 
-### 3. Start Using the System
+### 3. Interactive Chat with Agent
 
-```python
-# main.py - Chat with your agent
-from agent.graph import graph
+```bash
+# Start the interactive chat agent
+python main.py
+```
 
-state = {"question": "How to handle Python exceptions?"}
-result = graph.invoke(state)
-print(result["answer"])
+Example conversation:
+```
+You: How to handle exceptions in Python?
+Agent: Python provides several mechanisms for exception handling...
 ```
 
-### 4. Switch Retrieval Configurations
+### 4. Configuration Management
 
 ```bash
-# List available configurations
+# List available retrieval configurations
 python bin/switch_agent_config.py --list
 
-# Switch to advanced reranked pipeline  
-python bin/switch_agent_config.py advanced_reranked
+# Switch to different retrieval strategy
+python bin/switch_agent_config.py modern_hybrid
 
-# Test the configuration
-python test_agent_retriever_node.py
+# Test the new configuration
+python -c "
+from agent.graph import graph
+result = graph.invoke({'question': 'test query'})
+print(result['answer'])
+"
 ```
 
 ## Available Configurations
 
-| Configuration | Description | Components | Use Case |
-|---------------|-------------|------------|----------|
-| `basic_dense` | Simple dense retrieval | Dense retriever only | Development, testing |
-| `advanced_reranked` | Production quality | Dense + CrossEncoder + filters | Production RAG |
-| `hybrid_multistage` | Best performance | Hybrid + multi-stage reranking | High-quality results |
-| `experimental` | Latest features | BGE reranker + custom filters | Experimentation |
+| Configuration | Description | Retrieval Method | Performance | Use Case |
+|---------------|-------------|------------------|-------------|----------|
+| `ci_google_gemini` | CI/CD optimized | Dense only | Fast | Testing, CI |
+| `fast_hybrid` | Speed optimized | Hybrid + RRF | Very Fast | Production chat |
+| `modern_dense` | Dense semantic | Dense + Reranking | Medium | Semantic search |
+| `modern_hybrid` | Best quality | Hybrid + CrossEncoder | Slower | Research, Q&A |
+
+### Benchmark Scenarios
 
-## 🔧 **Configuration Example**
+| Scenario | Focus | Components | Metrics |
+|----------|-------|------------|---------|
+| `dense_baseline` | Simple dense retrieval | Google embeddings | Precision@K, Recall@K |
+| `hybrid_retrieval` | Dense + sparse fusion | RRF fusion | MRR, NDCG |
+| `hybrid_reranking` | Full reranking pipeline | CrossEncoder + filters | F1, MAP |
+| `sparse_bm25` | Traditional IR | BM25 only | Baseline metrics |
 
+## Configuration Examples
+
+### Fast Hybrid Configuration
 ```yaml
-# pipelines/configs/retrieval/advanced_reranked.yml
+# pipelines/configs/retrieval/fast_hybrid.yml
+description: "Fast hybrid retrieval optimized for agent response speed"
+
 retrieval_pipeline:
   retriever:
-    type: dense
+    type: hybrid
     top_k: 10
+    score_threshold: 0.05
+    fusion_method: rrf
+    
+    fusion:
+      method: rrf
+      rrf_k: 50
+      dense_weight: 0.8
+      sparse_weight: 0.2
+    
+    embedding:
+      strategy: hybrid
+      dense:
+        provider: google
+        model: models/embedding-001
+        dimensions: 768
+        api_key_env: GOOGLE_API_KEY
+```
+
+### Modern Dense with Reranking
+```yaml
+# pipelines/configs/retrieval/modern_dense.yml
+description: "Dense semantic retrieval with Google embeddings and neural reranking"
+
+retrieval_pipeline:
+  retriever:
+    type: dense
+    top_k: 15
+    score_threshold: 0.0
     
   stages:
     - type: reranker
       config:
         model_type: cross_encoder
-        model_name: "ms-marco-MiniLM-L-6-v2"
+        model_name: "cross-encoder/ms-marco-MiniLM-L-6-v2"
+        top_k: 10
         
     - type: filter
       config:
         type: score
-        min_score: 0.5
-        
-    - type: answer_enhancer
-      config:
-        boost_factor: 2.0
+        min_score: 0.3
 ```
 
 ## Project Structure
 
 ```
 Thesis/
-├── agent/                     # LangGraph agent implementation
-│   ├── graph.py                  # Main agent graph
-│   ├── schema.py                 # Agent state schemas
-│   └── nodes/                    # Agent nodes (retriever, generator, etc.)
+├── main.py                    # Interactive chat application
+├── config.yml                 # Main configuration file
+├── docker-compose.yml         # Docker services (Qdrant, PostgreSQL)
+├── requirements.txt           # Python dependencies
+├── .env                       # Environment variables
+│
+├── agent/                     # LangGraph Agent System
+│   ├── graph.py              # Main agent workflow graph
+│   ├── schema.py             # Agent state definitions
+│   └── nodes/                # Agent workflow nodes
+│       ├── retriever.py      # Configurable retriever node
+│       ├── generator.py      # Response generation node
+│       ├── query_interpreter.py # Query analysis node
+│       └── memory_updater.py # Conversation memory node
+│
+├── components/               # Modular Retrieval Components
+│   ├── retrieval_pipeline.py # Core pipeline framework
+│   ├── rerankers.py          # CrossEncoder, BGE rerankers
+│   ├── advanced_rerankers.py # Multi-stage reranking
+│   └── filters.py            # Score, content, metadata filters
 │
-├── components/                # Modular retrieval components
-│   ├── retrieval_pipeline.py    # Main pipeline orchestrator
-│   ├── rerankers.py             # Reranking implementations
-│   ├── filters.py               # Filtering implementations
-│   └── advanced_rerankers.py    # Advanced reranking strategies
+├── pipelines/                # Data Pipelines & Configurations
+│   ├── configs/
+│   │   └── retrieval/        # YAML retrieval configurations
+│   │       ├── fast_hybrid.yml
+│   │       ├── modern_dense.yml
+│   │       ├── modern_hybrid.yml
+│   │       └── ci_google_gemini.yml
+│   ├── adapters/             # Dataset adapters (BEIR, custom)
+│   └── ingest/               # Data ingestion pipeline
 │
-├── pipelines/                 # Data processing and configuration
-│   ├── configs/retrieval/       # Retrieval pipeline configurations
-│   ├── adapters/                # Dataset adapters (BEIR, etc.)
-│   └── ingest/                  # Data ingestion pipeline
+├── benchmarks/               # Evaluation Framework
+│   ├── benchmarks_runner.py  # Main benchmark orchestrator
+│   ├── benchmarks_metrics.py # Precision, Recall, NDCG, MRR
+│   ├── benchmarks_adapters.py # Dataset adapters for evaluation
+│   └── run_real_benchmark.py # Real data benchmarking
 │
-├── bin/                       # Command-line utilities
-│   ├── switch_agent_config.py   # Configuration management
-│   ├── agent_retriever.py       # Configurable retriever agent
-│   └── retrieval_pipeline.py    # Direct pipeline usage
+├── benchmark_scenarios/      # Predefined Benchmark Configurations
+│   ├── dense_baseline.yml    # Simple dense retrieval
+│   ├── hybrid_retrieval.yml  # Hybrid dense+sparse
+│   ├── hybrid_reranking.yml  # Full reranking pipeline
+│   └── sparse_bm25.yml       # BM25 baseline
 │
-├── docs/                      # Documentation
-│   ├── SYSTEM_EXTENSION_GUIDE.md # Complete extension guide
-│   ├── AGENT_INTEGRATION.md     # Agent integration details
-│   ├── CODE_CLEANUP_SUMMARY.md  # Code cleanup documentation
-│   └── EXTENSIBILITY.md         # Quick extensibility overview
+├── bin/                      # Command-line Utilities
+│   ├── switch_agent_config.py # Configuration management
+│   ├── agent_retriever.py    # Standalone retriever CLI
+│   ├── qdrant_inspector.py   # Database inspection tool
+│   └── ingest.py             # Data ingestion utility
 │
-├── tests/                     # Test suite
-│   ├── retrieval/               # Retrieval pipeline tests
-│   └── agent/                   # Agent integration tests
+├── database/                 # Database Controllers
+│   ├── qdrant_controller.py  # Vector database operations
+│   └── postgres_controller.py # Relational database operations
 │
-├── deprecated/                # Legacy code (organized)
-│   ├── old_processors/          # Superseded by new pipeline
-│   ├── old_debug_scripts/       # Legacy debugging tools
-│   └── old_playground/          # Legacy test scripts
+├── embedding/                # Embedding & Text Processing
+│   ├── factory.py            # Embedding provider factory
+│   ├── bedrock_embeddings.py # AWS Bedrock embeddings
+│   ├── sparse_embedder.py    # BM25 sparse embeddings
+│   └── processor.py          # Text processing utilities
 │
-├── database/                  # Database controllers
-├── embedding/                 # Embedding utilities
-├── retrievers/               # Base retrievers
-├── examples/                 # Usage examples
-└── config/                   # Configuration utilities
+├── tests/                    # Comprehensive Test Suite
+│   ├── pipeline/             # Pipeline component tests
+│   │   ├── test_minimal_pipeline.py
+│   │   ├── test_qdrant_connectivity.py
+│   │   └── test_end_to_end.py
+│   └── requirements-minimal.txt
+│
+├── docs/                     # Documentation
+│   ├── PROJECT_STRUCTURE.md  # Detailed project structure
+│   ├── QUICK_START_GUIDE.md  # Getting started guide
+│   └── SOSUM_INGESTION.md    # Dataset ingestion guide
+│
+└── logs/                     # Application Logs
+    ├── agent.log             # Agent workflow logs
+    └── query_interpreter.log # Query processing logs
+```
+
+## Benchmarking & Evaluation
+
+### Running Benchmarks
+
+```bash
+# Run comprehensive benchmark with real StackOverflow data
+python benchmarks/run_real_benchmark.py
+
+# Run specific benchmark scenario
+python benchmarks/run_benchmark_optimization.py --scenario hybrid_reranking
+
+# Quick performance test
+python benchmarks/run_benchmark_optimization.py --scenario quick_test
+```
+
+### Available Metrics
+
+- **Precision@K**: Fraction of relevant documents in top-K results
+- **Recall@K**: Fraction of relevant documents retrieved in top-K
+- **MRR (Mean Reciprocal Rank)**: Average reciprocal rank of first relevant result
+- **NDCG@K**: Normalized Discounted Cumulative Gain
+- **F1 Score**: Harmonic mean of precision and recall
+- **MAP (Mean Average Precision)**: Mean of precision scores at each relevant document
+
+### Custom Benchmark Configuration
+
+```yaml
+# benchmark_scenarios/custom_scenario.yml
+scenario_name: "custom_hybrid"
+description: "Custom hybrid retrieval evaluation"
+
+benchmark:
+  retrieval:
+    strategy: hybrid
+    top_k: 20
+    score_threshold: 0.0
+  evaluation:
+    k_values: [1, 5, 10, 20]
+    metrics: ["precision", "recall", "mrr", "ndcg"]
+
+retrieval_pipeline:
+  retriever:
+    type: hybrid
+    fusion_method: rrf
+    # ... configuration details
 ```
 
 ## Testing
 
+### Run Test Suite
+
 ```bash
-# Test agent integration
-python test_agent_retriever_node.py
+# Run minimal pipeline tests (CI-friendly)
+python -m pytest tests/pipeline/test_minimal_pipeline.py -v
+
+# Run all pipeline tests
+python -m pytest tests/pipeline/ -v
 
-# Run all tests
-python tests/run_all_tests.py
+# Run tests with coverage
+python -m pytest tests/pipeline/ --cov=components --cov=agent
 
-# Test specific components
-python -m pytest tests/retrieval/ -v
+# Run specific test categories
+python -m pytest tests/pipeline/ -m "not requires_api"  # No API required
+python -m pytest tests/pipeline/ -m "requires_api"     # Requires API keys
+```
+
+### Test Qdrant Connectivity
+
+```bash
+# Test vector database connection
+python -m pytest tests/pipeline/test_qdrant_connectivity.py -v
+
+# Inspect Qdrant collections
+python bin/qdrant_inspector.py --list-collections
+python bin/qdrant_inspector.py --collection-info sosum_stackoverflow_hybrid_v1
+```
+
+### Integration Testing
+
+```bash
+# Test agent with different configurations
+python -c "
+from agent.graph import graph
+configs = ['fast_hybrid', 'modern_dense', 'modern_hybrid']
+for config in configs:
+    print(f'Testing {config}...')
+    # Switch config and test
+"
 ```
 
 ## Documentation
 
-- **[System Extension Guide](docs/SYSTEM_EXTENSION_GUIDE.md)** - Complete guide to extending the system
-- **[Agent Integration](docs/AGENT_INTEGRATION.md)** - How the agent uses configurable pipelines  
-- **[Code Cleanup Summary](docs/CODE_CLEANUP_SUMMARY.md)** - Professional code standards and cleanup details
-- **[Extensibility Overview](docs/EXTENSIBILITY.md)** - Quick overview of extension capabilities
-- **[Architecture](docs/MLOPS_PIPELINE_ARCHITECTURE.md)** - System architecture details
+- **[Project Structure](docs/PROJECT_STRUCTURE.md)** - Detailed project organization
+- **[Quick Start Guide](docs/QUICK_START_GUIDE.md)** - Getting started tutorial
+- **[SOSUM Ingestion](docs/SOSUM_INGESTION.md)** - Dataset ingestion guide
+- **[MLOps Architecture](docs/MLOPS_PIPELINE_ARCHITECTURE.md)** - System architecture details
 
 ## Extending the System
 
 ### Add a Custom Reranker
 
 ```python
-# components/my_reranker.py
-from .rerankers import BaseReranker
-
-class MyCustomReranker(BaseReranker):
-    def rerank(self, query: str, documents: List[Document]) -> List[Document]:
+# components/my_custom_reranker.py
+from components.retrieval_pipeline import Reranker, RetrievalResult
+from typing import List
+
+class MyCustomReranker(Reranker):
+    @property
+    def component_name(self) -> str:
+        return "my_custom_reranker"
+    
+    def process(self, query: str, results: List[RetrievalResult], **kwargs) -> List[RetrievalResult]:
         # Your custom reranking logic
-        for doc in documents:
-            doc.metadata["score"] = self.calculate_score(query, doc.page_content)
+        for result in results:
+            result.score = self.calculate_custom_score(query, result.document.page_content)
+            result.metadata["reranked_by"] = self.component_name
         
-        return sorted(documents, key=lambda x: x.metadata["score"], reverse=True)
+        return sorted(results, key=lambda x: x.score, reverse=True)
+    
+    def calculate_custom_score(self, query: str, content: str) -> float:
+        # Implement your scoring logic
+        return 0.5  # Placeholder
 ```
 
 ### Create a New Configuration
 
 ```yaml
 # pipelines/configs/retrieval/my_config.yml
+description: "My custom retrieval configuration"
+
 retrieval_pipeline:
   retriever:
     type: hybrid
@@ -205,62 +381,212 @@ retrieval_pipeline:
       config:
         model_type: my_custom
         custom_param: "value"
+        
+    - type: filter
+      config:
+        type: score
+        min_score: 0.4
 ```
 
 ### Switch and Test
 
 ```bash
+# Switch to your configuration
 python bin/switch_agent_config.py my_config
-python test_agent_retriever_node.py
+
+# Test the new configuration
+python -c "
+from agent.graph import graph
+result = graph.invoke({'question': 'test query', 'chat_history': []})
+print(f'Answer: {result[\"answer\"]}')
+print(f'Retrieved docs: {len(result.get(\"retrieved_documents\", []))}')
+"
 ```
 
-## Production Usage
+## Production Features
+
+### Performance Optimization
+- **Lazy Initialization**: Components load only when needed
+- **Connection Pooling**: Efficient database connection management
+- **Batch Processing**: Optimized embedding and reranking batches
+- **Caching**: LRU caching for repeated queries and embeddings
+- **Async Operations**: Non-blocking I/O for better throughput
+
+### Monitoring & Observability
+- **Structured Logging**: JSON logs with correlation IDs
+- **Performance Metrics**: Response times, cache hit rates, error rates
+- **Health Checks**: Database connectivity, model availability
+- **Rich Metadata**: Retrieval paths, scores, and method tracking
+
+### Configuration Management
+- **Environment-based Configs**: Different configs per environment
+- **Hot Reloading**: Switch configurations without restart
+- **Validation**: Schema validation for all configurations
+- **Rollback**: Easy rollback to previous configurations
+
+### Error Handling
+- **Graceful Degradation**: Fallback to simpler methods on failures
+- **Circuit Breakers**: Prevent cascade failures
+- **Retry Logic**: Exponential backoff for transient failures
+- **Comprehensive Logging**: Detailed error context and stack traces
 
-The system is designed for production use with:
+## Use Cases & Applications
 
-- **Robust Error Handling**: Graceful degradation when components fail
-- **Comprehensive Logging**: Monitor retrieval performance and quality
-- **Configuration Management**: Easy deployment of different strategies
-- **Performance Optimization**: Efficient batching and caching support
-- **Monitoring Ready**: Built-in metrics and health checks
+### Document Q&A Systems
+- **Knowledge Base Search**: Corporate wikis, documentation, FAQs
+- **Research Assistance**: Academic papers, technical documentation
+- **Customer Support**: Automated response generation with context
 
-## Use Cases
+### Code & Technical Search
+- **Semantic Code Search**: Find code snippets by functionality
+- **API Documentation**: Contextual API usage examples
+- **Stack Overflow Integration**: Programming Q&A with real data
+
+### Domain-Specific Applications
+- **Legal Research**: Case law, regulations, legal precedents
+- **Medical Literature**: Research papers, clinical guidelines
+- **Financial Analysis**: Reports, earnings calls, market research
+
+### Multi-Modal Retrieval
+- **Table Extraction**: Structured data from documents
+- **Image-Text Retrieval**: Combined visual and textual search
+- **Temporal Queries**: Time-aware information retrieval
 
-- **Document Q&A Systems**: High-quality retrieval for knowledge bases
-- **Research Assistants**: Multi-modal retrieval for academic content
-- **Customer Support**: Context-aware response generation
-- **Code Search**: Semantic search over codebases
-- **Legal Research**: Precise retrieval from legal documents
 
 ## Contributing
 
-1. Fork the repository
-2. Create a feature branch
-3. Add your extension following the patterns in `docs/SYSTEM_EXTENSION_GUIDE.md`
-4. Add tests for your components
-5. Submit a pull request
+1. **Fork the Repository**
+   ```bash
+   git fork https://github.com/your-org/thesis-rag-system
+   cd thesis-rag-system
+   ```
+
+2. **Set Up Development Environment**
+   ```bash
+   python -m venv venv
+   source venv/bin/activate  # Linux/Mac
+   pip install -r requirements.txt
+   pip install -r tests/requirements-minimal.txt
+   ```
+
+3. **Create Feature Branch**
+   ```bash
+   git checkout -b feature/my-new-component
+   ```
+
+4. **Follow Extension Patterns**
+   - Add new components to `components/`
+   - Create configuration files in `pipelines/configs/retrieval/`
+   - Add tests to `tests/pipeline/`
+   - Update documentation
+
+5. **Run Tests**
+   ```bash
+   python -m pytest tests/pipeline/ -v
+   python -m pytest tests/pipeline/test_minimal_pipeline.py
+   ```
+
+6. **Submit Pull Request**
+   - Ensure all tests pass
+   - Include benchmark results if applicable
+   - Update documentation for new features
+
+### Development Guidelines
+
+- **Code Style**: Follow PEP 8, use type hints
+- **Testing**: Write tests for new components
+- **Documentation**: Update README and docstrings
+- **Configuration**: Provide example YAML configs
+- **Backwards Compatibility**: Maintain API compatibility
+
+## Docker Deployment
+
+### Using Docker Compose
+
+```bash
+# Start all services
+docker-compose up -d
+
+# Scale specific services
+docker-compose up -d --scale qdrant=2
+
+# View logs
+docker-compose logs -f rag-system
 
-## Performance
+# Stop services
+docker-compose down
+```
+
+### Custom Docker Build
 
-The system supports various performance optimization strategies:
+```bash
+# Build the image
+docker build -t my-rag-system .
+
+# Run with environment variables
+docker run -d \
+  -e GOOGLE_API_KEY=your_key \
+  -e QDRANT_HOST=qdrant \
+  -p 8000:8000 \
+  my-rag-system
+```
 
-- **Caching**: LRU caching for repeated queries
-- **Batching**: Efficient batch processing for rerankers
-- **Adaptive Top-K**: Dynamic result count based on query complexity
-- **Multi-threading**: Parallel processing for pipeline stages
+## Troubleshooting
 
-## Migration from Legacy
+### Common Issues
 
-If you have existing code using the deprecated `processors/` system:
+**Qdrant Connection Failed**
+```bash
+# Check Qdrant status
+docker ps | grep qdrant
+curl http://localhost:6333/health
 
-1. Check `deprecated/old_processors/` for reference
-2. Use the new pipeline configurations in `pipelines/configs/retrieval/`
-3. Follow the migration patterns in `docs/AGENT_INTEGRATION.md`
+# Restart Qdrant
+docker-compose restart qdrant
+```
+
+**API Key Issues**
+```bash
+# Check environment variables
+echo $GOOGLE_API_KEY
+echo $OPENAI_API_KEY
+
+# Test API connectivity
+python -c "
+import os
+from embedding.factory import EmbeddingFactory
+factory = EmbeddingFactory()
+embedder = factory.create_embedder('google')
+print('Google API working!')
+"
+```
+
+**Configuration Not Found**
+```bash
+# List available configurations
+python bin/switch_agent_config.py --list
+
+# Validate configuration
+python -c "
+import yaml
+with open('pipelines/configs/retrieval/modern_hybrid.yml') as f:
+    config = yaml.safe_load(f)
+    print('Configuration valid!')
+"
+```
 
 ## License
 
-This project is licensed under the MIT License - see the LICENSE file for details.
+This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
+
+## Acknowledgments
+
+- **LangGraph**: Agent workflow orchestration
+- **Qdrant**: High-performance vector database
+- **Sentence Transformers**: Embedding models and rerankers
+- **Google AI**: Embedding API services
+- **BEIR**: Benchmark datasets for information retrieval
 
 ---
 
-**Ready to build amazing RAG systems?** Start with the [System Extension Guide](docs/SYSTEM_EXTENSION_GUIDE.md)!
+**Ready to build production RAG systems?** Start with our [Quick Start Guide](docs/QUICK_START_GUIDE.md)!
diff --git a/docker-compose.yml b/docker-compose.yml
index ab4c6dd..bad2a02 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -8,17 +8,6 @@ services:
     volumes:
       - qdrant_data:/qdrant/storage
 
-  postgres:
-    image: postgres:14
-    environment:
-      POSTGRES_USER: admin
-      POSTGRES_PASSWORD: admin
-      POSTGRES_DB: tableDB
-    ports:
-      - "5432:5432"
-    volumes:
-      - postgres_data:/var/lib/postgresql/data
-
   app:
     build:
       context: .
@@ -26,16 +15,10 @@ services:
     environment:
       - QDRANT_HOST=qdrant
       - QDRANT_PORT=6333
-      - POSTGRES_HOST=postgres
-      - POSTGRES_PORT=5432
-      - POSTGRES_USER=admin
-      - POSTGRES_PASSWORD=admin
-      - POSTGRES_DB=tableDB
     volumes:
       - .:/app
     depends_on:
       - qdrant
-      - postgres
     ports:
       - "8000:8000"  # optional, if you add an API later
     working_dir: /app
@@ -43,4 +26,4 @@ services:
 
 volumes:
   qdrant_data:
-  postgres_data:
+
diff --git a/docs/PROJECT_STRUCTURE.md b/docs/PROJECT_STRUCTURE.md
index ce14846..87333a6 100644
--- a/docs/PROJECT_STRUCTURE.md
+++ b/docs/PROJECT_STRUCTURE.md
@@ -20,7 +20,10 @@ agent/
 ├── graph.py                    # LangGraph agent workflow
 ├── schema.py                   # Agent state schema
 └── nodes/
-    └── retriever.py            # Configurable retriever node
+    ├── retriever.py            # Configurable retriever node
+    ├── generator.py            # Response generation node
+    ├── query_interpreter.py    # Query analysis node
+    └── memory_updater.py       # Conversation memory node
 ```
 
 ### Components (Modular Pipeline System)
@@ -54,7 +57,7 @@ embedding/
 ├── __init__.py
 ├── factory.py                 # Embedding factory
 ├── bedrock_embeddings.py      # AWS Bedrock embeddings
-├── hf_embedder.py            # HuggingFace embeddings
+├── embeddings.py              # Core embedding utilities
 ├── processor.py              # Embedding processing
 ├── recursive_splitter.py     # Document splitting
 ├── sparse_embedder.py        # Sparse embeddings
@@ -65,28 +68,81 @@ embedding/
 ### Pipeline Configurations
 ```
 pipelines/
+├── README.md                  # Pipeline documentation
+├── __init__.py
+├── contracts.py              # Core pipeline contracts
 ├── configs/
 │   └── retrieval/             # YAML retrieval configurations
-│       ├── stackoverflow_minilm.yml
-│       ├── hybrid_basic.yml
-│       └── advanced_ensemble.yml
+│       ├── ci_google_gemini.yml
+│       ├── fast_hybrid.yml
+│       ├── modern_dense.yml
+│       └── modern_hybrid.yml
 ├── adapters/                  # Data adapters
+├── eval/                     # Evaluation components
 └── ingest/                   # Ingestion pipelines
 ```
 
 ### CLI Tools
 ```
 bin/
+├── __init__.py
 ├── agent_retriever.py         # CLI agent retriever
-├── switch_agent_config.py     # Configuration switching utility
-└── qdrant_inspector.py        # Qdrant inspection tool
+├── ingest.py                  # Data ingestion utility
+├── qdrant_inspector.py        # Qdrant inspection tool
+├── retrieval_pipeline.py     # Direct pipeline usage
+└── switch_agent_config.py     # Configuration switching utility
 ```
 
 ### Examples
 ```
-examples/
-├── simple_qa_agent.py         # Simple Q&A agent example
-└── (other examples...)
+# Note: Examples directory not present in current structure
+# Usage examples are provided in documentation and test files
+```
+
+### Benchmarking System
+```
+benchmarks/
+├── __init__.py
+├── benchmark_contracts.py     # Benchmark interfaces
+├── benchmark_optimizer.py     # Configuration optimization
+├── benchmarks_adapters.py     # Dataset adapters for evaluation
+├── benchmarks_metrics.py      # Evaluation metrics (Precision, Recall, NDCG)
+├── benchmarks_runner.py       # Main benchmark orchestrator
+├── run_benchmark_optimization.py # Optimization scripts
+└── run_real_benchmark.py      # Real data benchmarking
+```
+
+### Benchmark Scenarios
+```
+benchmark_scenarios/
+├── dense_baseline.yml         # Simple dense retrieval
+├── dense_high_precision.yml   # High precision dense config
+├── dense_high_recall.yml      # High recall dense config
+├── hybrid_advanced.yml        # Advanced hybrid configuration
+├── hybrid_reranking.yml       # Full reranking pipeline
+├── hybrid_retrieval.yml       # Basic hybrid retrieval
+├── hybrid_weighted.yml        # Weighted hybrid approach
+├── quick_test.yml             # Quick performance test
+└── sparse_bm25.yml           # BM25 baseline
+```
+
+### Additional Components
+```
+datasets/                      # Dataset storage
+├── sosum/                    # SOSum Stack Overflow dataset
+
+extraction_output/             # Table extraction results
+├── *.csv                     # Extracted tables from documents
+
+logs/                         # Application logs
+├── agent.log                 # Agent workflow logs
+├── query_interpreter.log    # Query processing logs
+└── (other log files...)
+
+playground/                   # Development and testing scripts
+processors/                   # Legacy processing components
+retrievers/                   # Base retriever implementations
+scripts/                      # Utility scripts
 ```
 
 ## 🧪 Test Organization
@@ -96,9 +152,20 @@ All tests are now organized under the `tests/` directory with clear categorizati
 ### Test Structure
 ```
 tests/
-├── run_all_tests.py           # Main test runner
-├── test_agent_retrieval.py    # Agent integration tests
-├── agent/                     # Agent-specific tests
+├── __init__.py
+├── requirements-minimal.txt   # Minimal test dependencies
+└── pipeline/                  # Pipeline component tests
+    ├── __init__.py
+    ├── run_tests.py           # Test runner
+    ├── test_components.py     # Component integration tests
+    ├── test_config.py         # Configuration validation tests
+    ├── test_end_to_end.py     # End-to-end pipeline tests
+    ├── test_minimal.py        # Minimal functionality tests
+    ├── test_minimal_pipeline.py # CI-friendly minimal tests
+    ├── test_qdrant.py         # Qdrant database tests
+    ├── test_qdrant_connectivity.py # Database connectivity tests
+    └── test_runner.py         # Test execution utilities
+```
 │   └── test_retriever_node.py
 ├── components/                # Component unit tests
 │   ├── test_retrieval_pipeline.py
diff --git a/docs/QUICK_START_GUIDE.md b/docs/QUICK_START_GUIDE.md
index 5f0c4c1..e4afc9f 100644
--- a/docs/QUICK_START_GUIDE.md
+++ b/docs/QUICK_START_GUIDE.md
@@ -1,6 +1,14 @@
-# Quick Start Guide: Implementing MLOps Pipeline for RAG
+# Quick Start Guide: Understanding the MLOps Pipeline for RAG
 
-This guide provides a step-by-step walkthrough for implementing the MLOps pipeline architecture in your own projects.
+**⚠️ Important Note**: This guide provides a simplified tutorial for understanding the MLOps concepts. The actual project has a much more sophisticated implementation with advanced features like hybrid embeddings, multiple chunking strategies, agent workflows, and comprehensive benchmarking.
+
+**To use the actual project:**
+- See `README.md` for setup instructions
+- Use the CLI: `python bin/ingest.py --help`
+- Check `docs/SOSUM_INGESTION.md` for real dataset examples
+- Review `docs/MLOPS_PIPELINE_ARCHITECTURE.md` for detailed architecture
+
+This guide provides a step-by-step walkthrough for implementing a simplified version of the MLOps pipeline architecture.
 
 ## Prerequisites
 
@@ -371,6 +379,13 @@ class DocumentChunker:
         return chunked_docs
 ```
 
+**Note**: The actual implementation (`pipelines/ingest/chunker.py`) has multiple advanced chunking strategies:
+- `RecursiveChunkingStrategy`: Basic recursive character splitting
+- `SemanticChunkingStrategy`: Sentence-boundary aware chunking  
+- `CodeAwareChunkingStrategy`: Preserves code blocks and functions
+- `TableAwareChunkingStrategy`: Preserves table structure
+- `ChunkingStrategyFactory`: Factory for strategy selection
+
 ### Simple Embedder (`pipelines/ingest/embedder.py`)
 ```python
 """Embedding generation."""
@@ -451,160 +466,99 @@ class EmbeddingPipeline:
         )
 ```
 
-## 6. Create Simple CLI Interface (30 minutes)
+**Note**: The actual implementation (`pipelines/ingest/embedder.py`) is more sophisticated with:
+- Support for dense, sparse, and hybrid embedding strategies
+- Caching and error handling
+- Batch processing with progress bars
+- Integration with multiple embedding providers (HuggingFace, Google, AWS Bedrock)
 
-### CLI Script (`bin/ingest.py`)
-```python
-#!/usr/bin/env python3
-"""Simple ingestion CLI."""
-import argparse
-import yaml
-import logging
-from pathlib import Path
-import sys
-import importlib
+## 6. Use the Actual CLI Interface (15 minutes)
 
-# Add project root to path
-sys.path.insert(0, str(Path(__file__).parent.parent))
+The actual project has a sophisticated CLI with subcommands. Here's how to use it:
 
-from pipelines.ingest.chunker import DocumentChunker
-from pipelines.ingest.embedder import EmbeddingPipeline
+### CLI Usage Examples
+```bash
+# View available commands
+python bin/ingest.py --help
 
-# Configure logging
-logging.basicConfig(
-    level=logging.INFO,
-    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
-)
-logger = logging.getLogger(__name__)
+# Ingest a dataset (dry run)
+python bin/ingest.py ingest natural_questions /path/to/data --config config.yml --dry-run --max-docs 100
 
-def load_adapter(adapter_name: str, data_path: str):
-    """Dynamically load dataset adapter."""
-    module_name = f"pipelines.adapters.{adapter_name}"
-    module = importlib.import_module(module_name)
-    
-    # Find adapter class (assumes pattern: XxxDatasetAdapter)
-    adapter_class = None
-    for attr_name in dir(module):
-        attr = getattr(module, attr_name)
-        if (isinstance(attr, type) and 
-            hasattr(attr, 'source_name') and 
-            attr_name.endswith('Adapter')):
-            adapter_class = attr
-            break
-    
-    if not adapter_class:
-        raise ValueError(f"No adapter class found in {module_name}")
-    
-    return adapter_class(data_path)
-
-def main():
-    parser = argparse.ArgumentParser(description="Simple RAG Ingestion Pipeline")
-    parser.add_argument("config", help="Configuration file path")
-    parser.add_argument("--dry-run", action="store_true", help="Run without uploading")
-    parser.add_argument("--max-docs", type=int, help="Limit number of documents")
-    parser.add_argument("--verbose", "-v", action="store_true", help="Verbose logging")
-    
-    args = parser.parse_args()
-    
-    if args.verbose:
-        logging.getLogger().setLevel(logging.DEBUG)
-    
-    # Load configuration
-    with open(args.config, 'r') as f:
-        config = yaml.safe_load(f)
-    
-    logger.info(f"Starting ingestion with config: {args.config}")
-    
-    try:
-        # Load dataset adapter
-        adapter = load_adapter(
-            config["dataset"]["adapter"],
-            config["dataset"]["path"]
-        )
-        logger.info(f"Loaded adapter: {adapter.source_name}")
-        
-        # Read data
-        rows = list(adapter.read_rows())
-        if args.max_docs:
-            rows = rows[:args.max_docs]
-        logger.info(f"Read {len(rows)} rows")
-        
-        # Convert to documents
-        documents = adapter.to_documents(rows, split="all")
-        logger.info(f"Created {len(documents)} documents")
-        
-        # Chunk documents
-        chunker = DocumentChunker(config["chunking"])
-        chunked_docs = chunker.chunk_documents(documents)
-        logger.info(f"Created {len(chunked_docs)} chunks")
-        
-        # Generate embeddings
-        embedder = EmbeddingPipeline(config["embedding"])
-        chunk_metas = embedder.process_documents(chunked_docs)
-        logger.info(f"Generated embeddings for {len(chunk_metas)} chunks")
-        
-        if args.dry_run:
-            logger.info("DRY RUN - Would upload to vector store")
-            logger.info(f"Sample chunk: {chunk_metas[0].chunk_id}")
-        else:
-            # TODO: Implement vector store upload
-            logger.info("Vector store upload not implemented yet")
-        
-        logger.info("Ingestion completed successfully!")
-        
-    except Exception as e:
-        logger.error(f"Ingestion failed: {e}")
-        return 1
-    
-    return 0
+# Ingest Stack Overflow dataset
+python bin/ingest.py ingest stackoverflow /path/to/sosum --config config.yml
 
-if __name__ == "__main__":
-    sys.exit(main())
-```
+# Run in canary mode for testing
+python bin/ingest.py ingest energy_papers /path/to/papers --canary --max-docs 50
+
+# Check collection status
+python bin/ingest.py status --config config.yml
 
-## 7. Test Your Implementation (15 minutes)
+# Evaluate retrieval performance
+python bin/ingest.py evaluate natural_questions /path/to/data --output-dir results/
 
-### Create Test Data (`test_data.csv`)
-```csv
-id,title,content,category
-1,"Introduction to Python","Python is a programming language that lets you work quickly and integrate systems more effectively.","programming"
-2,"Machine Learning Basics","Machine learning is a method of data analysis that automates analytical model building.","ai"
-3,"Data Science Overview","Data science is an interdisciplinary field that uses scientific methods to extract knowledge from data.","data"
+# Batch ingestion
+python bin/ingest.py batch-ingest batch_config.json
 ```
 
-### Test Configuration (`test_config.yml`)
-```yaml
-dataset:
-  name: "test_dataset"
-  version: "1.0.0"
-  adapter: "csv_dataset"
-  path: "test_data.csv"
+### Batch Configuration Example (`batch_config.json`)
+```json
+{
+  "datasets": [
+    {"type": "natural_questions", "path": "/path/to/nq", "version": "1.0.0"},
+    {"type": "stackoverflow", "path": "/path/to/sosum", "version": "1.0.0"}
+  ]
+}
+```
 
-chunking:
-  strategy: "recursive_character"
-  chunk_size: 500
-  chunk_overlap: 50
+### Available Adapter Types
+- `natural_questions`: Natural Questions dataset
+- `stackoverflow`: Stack Overflow (SOSum format) dataset  
+- `energy_papers`: Energy research papers dataset
 
-embedding:
-  strategy: "dense"
-  provider: "hf"
-  model: "sentence-transformers/all-MiniLM-L6-v2"
-  batch_size: 2
+## 7. Test with Actual Implementation (15 minutes)
 
-vector_store:
-  provider: "qdrant"
-  collection_name: "test_v1"
+### Use Real Configuration Files
+The actual project has several pre-configured YAML files you can use:
+
+```bash
+# List available configurations
+ls pipelines/configs/retrieval/
+
+# Available configs:
+# - modern_dense.yml: Dense embeddings with neural reranking
+# - modern_hybrid.yml: Hybrid dense+sparse with reranking  
+# - fast_hybrid.yml: Fast hybrid retrieval
+# - ci_google_gemini.yml: CI configuration with Google embeddings
 ```
 
-### Run Test
+### Test with Stack Overflow Dataset
 ```bash
-# Create test data
-echo 'id,title,content,category
-1,"Introduction to Python","Python is a programming language that lets you work quickly and integrate systems more effectively.","programming"
-2,"Machine Learning Basics","Machine learning is a method of data analysis that automates analytical model building.","ai"' > test_data.csv
+# Download SOSum dataset
+mkdir -p datasets/sosum
+cd datasets/sosum
+# Download from https://github.com/BonanKou/SOSum-A-Dataset-of-Extractive-Summaries-of-Stack-Overflow-Posts-and-labeling-tools
+
+# Test the adapter (dry run)
+python bin/ingest.py ingest stackoverflow datasets/sosum/data --config config.yml --dry-run --max-docs 10 --verbose
+
+# Check what was ingested
+python bin/ingest.py status --config config.yml
+```
+
+### Actual Configuration Structure
+The real `config.yml` looks like this:
+```yaml
+# Main configuration file
+agent:
+  retrieval_pipeline_config: "pipelines/configs/retrieval/modern_dense.yml"
+
+database:
+  qdrant:
+    host: "localhost"
+    port: 6333
+    collection_name: "sosum_stackoverflow_hybrid_v1"
 
-# Test the pipeline
-python bin/ingest.py test_config.yml --dry-run --max-docs 2 --verbose
+# The retrieval configs contain detailed chunking and embedding settings
 ```
 
 ## 8. Next Steps and Extensions
diff --git a/docs/SOSUM_INGESTION.md b/docs/SOSUM_INGESTION.md
index b0a60f9..2b8351d 100644
--- a/docs/SOSUM_INGESTION.md
+++ b/docs/SOSUM_INGESTION.md
@@ -41,12 +41,17 @@ SOSum comes in two CSV files:
 ### 1. Download the Dataset
 
 ```bash
-# Clone the SOSum repository
-git clone https://github.com/BonanKou/SOSum-A-Dataset-of-Extractive-Summaries-of-Stack-Overflow-Posts-and-labeling-tools.git sosum
+# Clone the SOSum repository into the datasets directory
+cd datasets/
+git clone https://github.com/BonanKou/SOSum-A-Dataset-of-Extractive-Summaries-of-Stack-Overflow-Posts-and-labeling-tools.git sosum_source
 
-# The CSV files are in sosum/data/ directory
-ls sosum/data/
+# The CSV files are in sosum_source/data/ directory
+ls sosum_source/data/
 # Should show: question.csv  answer.csv
+
+# Or download the CSV files directly
+mkdir -p sosum/
+# Place question.csv and answer.csv in sosum/ directory
 ```
 
 ### 2. Test the Adapter