SynthoraAI Agentic AI Pipeline

Sophisticated multi-agent system for advanced content processing using LangGraph and LangChain

📋 Table of Contents

Overview
Architecture
Features
Installation
Quick Start
Usage
Agents
Configuration
Monitoring & Observability
Testing
Performance
Troubleshooting
Contributing

🎯 Overview

The SynthoraAI Agentic AI Pipeline is a production-ready, multi-agent system that processes articles through a series of specialized AI agents. Built on LangGraph's state machine framework, it provides:

✅ Sophisticated Content Analysis - Deep understanding of article structure and content
✅ Intelligent Summarization - Concise, accurate summaries with key points
✅ Advanced Classification - Multi-category topic classification
✅ Sentiment Analysis - Emotional tone and objectivity assessment
✅ Quality Assurance - Automatic validation with retry logic
✅ Production-Ready - Cloud deployment, monitoring, and scaling

🏗️ Architecture

Assembly Line Architecture

The pipeline implements an assembly line pattern where articles flow through specialized agents:

┌─────────┐    ┌─────────┐    ┌───────────┐    ┌──────────┐    ┌───────────┐    ┌─────────┐    ┌────────┐
│ Intake  │───▶│ Content │───▶│Summarizer │───▶│Classifier│───▶│ Sentiment │───▶│ Quality │───▶│ Output │
│  Node   │    │Analyzer │    │           │    │          │    │ Analyzer  │    │ Checker │    │  Node  │
└─────────┘    └─────────┘    └───────────┘    └──────────┘    └───────────┘    └─────────┘    └────────┘
                                                                                        │
                                                                                        │ (retry)
                                                                                        └──────────┐
                                                                                                   │
                                                                                        ┌──────────▼─┐
                                                                                        │   Retry    │
                                                                                        │  (max 3x)  │
                                                                                        └────────────┘

State Machine with LangGraph

The pipeline uses LangGraph to orchestrate agent interactions:

State Management: ArticleState flows through all agents
Conditional Routing: Quality checks determine retry logic
Error Handling: Graceful degradation and error nodes
Observability: Built-in tracing and metrics

✨ Features

1. Content Analyzer Agent

Extracts structured information from articles:

Document structure (paragraphs, sections)
Named entities (people, organizations, locations)
Key dates and events
Important facts and claims
Writing style analysis

2. Summarizer Agent

Generates high-quality summaries:

150-200 word concise summaries
3-5 key bullet points
Factual accuracy preservation
Context-aware summarization

3. Classifier Agent

Categorizes articles into topics:

15+ government-relevant categories
Multi-label classification
Confidence scores
Automatic tag generation

Categories:

Politics & Government
Economy & Finance
Healthcare
Education
Environment & Climate
Technology & Innovation
Security & Defense
International Relations
Law & Justice
Social Issues
Infrastructure
Energy
Agriculture
Science & Research
Public Safety

4. Sentiment Analyzer Agent

Analyzes emotional characteristics:

Sentiment score (-1 to 1)
Sentiment label (positive/neutral/negative)
Objectivity score (0 to 1)
Urgency level (low/medium/high)
Controversy score (0 to 1)

5. Quality Checker Agent

Validates all outputs:

Summary quality assessment
Classification consistency
Sentiment reasonableness
Completeness verification
Automatic retry logic

🚀 Installation

Prerequisites

Python 3.11 or later
MongoDB (local or cloud)
Redis (for caching)
Google AI API key

Step 1: Clone Repository

git clone https://github.com/your-org/AI-Gov-Content-Curator.git
cd AI-Gov-Content-Curator/agentic_ai

Step 2: Install Dependencies

pip install -r requirements.txt

Step 3: Configure Environment

cp .env.example .env
# Edit .env with your API keys and configuration

Required environment variables:

GOOGLE_AI_API_KEY=your_google_ai_api_key_here
MONGODB_URI=mongodb://localhost:27017/synthoraai
REDIS_URL=redis://localhost:6379/0

⚡ Quick Start

Basic Usage

from agentic_ai.core.pipeline import AgenticPipeline
import asyncio

# Initialize pipeline
pipeline = AgenticPipeline()

# Process an article
result = asyncio.run(pipeline.process_article({
    "id": "article-123",
    "content": "Full article content here...",
    "url": "https://example.com/article",
    "source": "government",
    "title": "Article Title"
}))

# Access results
print(f"Summary: {result['summary']}")
print(f"Topics: {result['topics']}")
print(f"Sentiment: {result['sentiment_label']}")
print(f"Quality Score: {result['quality_score']}")

Batch Processing

# Process multiple articles
articles = [
    {"id": "1", "content": "...", "url": "...", "source": "gov"},
    {"id": "2", "content": "...", "url": "...", "source": "gov"},
    # ... more articles
]

results = asyncio.run(pipeline.process_batch(articles, max_concurrent=5))

📖 Usage

Programmatic Usage

The pipeline can be imported and used directly in your Python code:

from agentic_ai import AgenticPipeline
import asyncio

async def main():
    pipeline = AgenticPipeline()

    result = await pipeline.process_article({
        "id": "article-123",
        "content": "Your article content...",
        "url": "https://example.com/article",
        "source": "government"
    })

    # Access specific results
    print(f"Summary: {result['summary']}")
    print(f"Key Points: {result['key_points']}")
    print(f"Topics: {result['topics']}")
    print(f"Primary Category: {result['primary_category']}")
    print(f"Sentiment: {result['sentiment_label']} ({result['sentiment_score']})")
    print(f"Objectivity: {result['objectivity_score']}")
    print(f"Quality Score: {result['quality_score']}")

if __name__ == "__main__":
    asyncio.run(main())

MCP Server

The pipeline includes a Model Context Protocol (MCP) server for standardized API access:

Starting the Server

# Development
python -m agentic_ai.mcp_server.server

# Production with Uvicorn
uvicorn agentic_ai.mcp_server.server:app --host 0.0.0.0 --port 8000 --workers 4

API Endpoints

POST /process - Process a single article

curl -X POST http://localhost:8000/process \
  -H "Content-Type: application/json" \
  -d '{
    "id": "article-123",
    "url": "https://example.com/article",
    "content": "Full article content...",
    "source": "government",
    "title": "Article Title"
  }'

POST /process_batch - Process multiple articles

curl -X POST http://localhost:8000/process_batch \
  -H "Content-Type: application/json" \
  -d '{
    "articles": [
      {"id": "1", "content": "...", "url": "...", "source": "gov"},
      {"id": "2", "content": "...", "url": "...", "source": "gov"}
    ],
    "max_concurrent": 5
  }'

GET /health - Health check

curl http://localhost:8000/health

GET /metrics - Get pipeline metrics

curl http://localhost:8000/metrics

Cloud Deployment

The pipeline supports deployment to AWS Lambda and Azure Functions.

AWS Lambda Deployment

# Deploy to staging
cd agentic_ai/aws
./deploy.sh staging

# Deploy to production
./deploy.sh production

Or use CloudFormation:

aws cloudformation create-stack \
  --stack-name synthoraai-pipeline-staging \
  --template-body file://cloudformation.yml \
  --parameters \
    ParameterKey=Environment,ParameterValue=staging \
    ParameterKey=GoogleAIAPIKey,ParameterValue=your_key_here \
    ParameterKey=MongoDBURI,ParameterValue=your_mongodb_uri

Azure Functions Deployment

# Deploy to staging
cd agentic_ai/azure
./deploy.sh staging

# Deploy to production
./deploy.sh production

🤖 Agents

Content Analyzer

Purpose: Extract structure and key information

Outputs:

structure: Document structure metadata
entities: Named entities (people, orgs, locations)
key_dates: Important dates
key_facts: Key facts and claims
writing_style: Style classification

Summarizer

Purpose: Generate concise summaries

Outputs:

summary: 150-200 word summary
summary_length: Word count
key_points: 3-5 bullet points

Classifier

Purpose: Categorize articles

Outputs:

topics: List of relevant topics
primary_category: Main category
confidence_scores: Confidence per topic
tags: Specific keywords

Sentiment Analyzer

Purpose: Analyze emotional characteristics

Outputs:

sentiment_score: -1 (negative) to 1 (positive)
sentiment_label: positive/neutral/negative
objectivity_score: 0 (subjective) to 1 (objective)
urgency_level: low/medium/high
controversy_score: 0 (none) to 1 (high)

Quality Checker

Purpose: Validate outputs

Outputs:

quality_score: Overall quality (0 to 1)
quality_issues: List of issues found
passes_quality: Boolean pass/fail
needs_retry: Whether retry is needed

⚙️ Configuration

Environment Variables

All configuration is managed through environment variables. See .env.example for a complete list.

Key Settings:

# AI Provider
GOOGLE_AI_API_KEY=your_key_here
GOOGLE_AI_API_KEY1=backup_key_1  # Optional
GOOGLE_AI_API_KEY2=backup_key_2  # Optional

# Pipeline Behavior
PIPELINE_MAX_RETRIES=3
PIPELINE_TIMEOUT=300
QUALITY_THRESHOLD=0.7

# Performance
MAX_CONCURRENT_REQUESTS=10
BATCH_SIZE=5
CACHE_TTL=3600

# Monitoring
LOG_LEVEL=INFO
ENABLE_METRICS=true
PROMETHEUS_PORT=9090

Programmatic Configuration

from agentic_ai.utils.config import Config

config = Config(
    google_ai_api_keys=["key1", "key2"],
    mongodb_uri="mongodb://localhost:27017/synthoraai",
    quality_threshold=0.8,
    pipeline_max_retries=5
)

pipeline = AgenticPipeline(config)

📊 Monitoring & Observability

Metrics

The pipeline collects comprehensive metrics:

# Get current metrics
metrics = pipeline.get_metrics()

print(f"Articles processed: {metrics['counters']['articles_processing_completed']}")
print(f"Average processing time: {metrics['histograms']['article_processing_time_seconds']['avg']}s")
print(f"Error rate: {metrics['counters']['articles_processing_errors']}")

Available Metrics:

articles_processing_started - Counter
articles_processing_completed - Counter
articles_processing_failed - Counter
articles_processing_errors - Counter
article_processing_time_seconds - Histogram

Logging

The pipeline uses structlog for structured logging:

import structlog

logger = structlog.get_logger()

# Logs include context
logger.info("processing_started", article_id="123", source="government")

LangSmith Tracing (Optional)

Enable LangSmith for detailed tracing:

LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=your_langsmith_key
LANGCHAIN_PROJECT=synthoraai-agentic-pipeline

🧪 Testing

Run Tests

# Install test dependencies
pip install pytest pytest-asyncio pytest-cov pytest-mock

# Run all tests
pytest tests/

# Run with coverage
pytest --cov=agentic_ai tests/

# Run specific test file
pytest tests/test_pipeline.py

Example Test

import pytest
from agentic_ai import AgenticPipeline

@pytest.mark.asyncio
async def test_article_processing():
    pipeline = AgenticPipeline()

    result = await pipeline.process_article({
        "id": "test-123",
        "content": "Test article content...",
        "url": "https://example.com/test",
        "source": "test"
    })

    assert result["summary"] is not None
    assert len(result["topics"]) > 0
    assert result["quality_score"] >= 0.7

⚡ Performance

Benchmarks

Based on real-world usage:

Average processing time: 8-12 seconds per article
Batch throughput: 100-150 articles/minute (with max_concurrent=10)
Quality threshold: 85% articles pass on first attempt
Retry rate: ~15% require retry
Error rate: <1% failures after retries

Optimization Tips

Batch Processing: Use process_batch() for multiple articles
Concurrent Requests: Adjust max_concurrent based on your resources
Caching: Enable Redis caching for repeated content
API Key Rotation: Use multiple API keys to avoid rate limits
Quality Threshold: Lower threshold to reduce retries (0.6-0.7 recommended)

🔧 Troubleshooting

Common Issues

Issue: Rate limit errors from Google AI

Solution: Add multiple API keys in environment:

GOOGLE_AI_API_KEY=key1
GOOGLE_AI_API_KEY1=key2
GOOGLE_AI_API_KEY2=key3

Issue: MongoDB connection errors

Solution: Verify MongoDB is running and URI is correct:

mongo --eval "db.adminCommand('ping')"

Issue: Quality check failures

Solution: Lower quality threshold or check input article quality:

QUALITY_THRESHOLD=0.6

Issue: Slow processing times

Solution: Increase concurrent requests:

MAX_CONCURRENT_REQUESTS=20

Debug Mode

Enable debug logging:

LOG_LEVEL=DEBUG

Or programmatically:

import structlog
structlog.configure(wrapper_class=structlog.make_filtering_bound_logger(logging.DEBUG))

🤝 Contributing

We welcome contributions! Please see the main CONTRIBUTING.md for guidelines.

Development Setup

# Create virtual environment
python -m venv venv
source venv/bin/activate  # or `venv\Scripts\activate` on Windows

# Install dev dependencies
pip install -r requirements.txt
pip install -r requirements-dev.txt

# Install pre-commit hooks
pre-commit install

# Run tests
pytest

# Run linters
black agentic_ai/
ruff check agentic_ai/
mypy agentic_ai/

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Built with LangChain and LangGraph
Powered by Google Generative AI (Gemini)
Part of the SynthoraAI project

📞 Support

For questions or support:

Email: [email protected]
GitHub Issues: Create an issue
Documentation: Full project docs

Made with ❤️ by the SynthoraAI Team

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
advanced		advanced
agents		agents
api		api
aws		aws
azure		azure
cli		cli
core		core
database		database
examples		examples
integrations		integrations
k8s		k8s
mcp_server		mcp_server
ml_training		ml_training
models		models
observability		observability
plugins		plugins
resources		resources
schemas		schemas
scripts		scripts
sdk/python		sdk/python
security		security
tests		tests
utils		utils
vector_db		vector_db
workers		workers
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
Dockerfile		Dockerfile
ENHANCEMENTS.md		ENHANCEMENTS.md
LICENSE		LICENSE
Makefile		Makefile
QUICKSTART.md		QUICKSTART.md
README.md		README.md
README_V2.md		README_V2.md
STRUCTURE.md		STRUCTURE.md
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

License

SynthoraAI-AI-News-Content-Curator/Agentic-AI

Folders and files

Latest commit

History

Repository files navigation

SynthoraAI Agentic AI Pipeline

📋 Table of Contents

🎯 Overview

🏗️ Architecture

Assembly Line Architecture

State Machine with LangGraph

✨ Features

1. Content Analyzer Agent

2. Summarizer Agent

3. Classifier Agent

4. Sentiment Analyzer Agent

5. Quality Checker Agent

🚀 Installation

Prerequisites

Step 1: Clone Repository

Step 2: Install Dependencies

Step 3: Configure Environment

⚡ Quick Start

Basic Usage

Batch Processing

📖 Usage

Programmatic Usage

MCP Server

Starting the Server

API Endpoints

Cloud Deployment

AWS Lambda Deployment

Azure Functions Deployment

🤖 Agents

Content Analyzer

Summarizer

Classifier

Sentiment Analyzer

Quality Checker

⚙️ Configuration

Environment Variables

Programmatic Configuration

📊 Monitoring & Observability

Metrics

Logging

LangSmith Tracing (Optional)

🧪 Testing

Run Tests

Example Test

⚡ Performance

Benchmarks

Optimization Tips

🔧 Troubleshooting

Common Issues

Debug Mode

🤝 Contributing

Development Setup

📄 License

🙏 Acknowledgments

📞 Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages