Skip to content

SynthoraAI-AI-News-Content-Curator/AI-Content-Publisher

Repository files navigation

SynthoraAI - AI-Powered Article Content Publisher

SynthoraAI - Synthesizing the world's news & information through AI. πŸš€βœ¨

The SynthoraAI - AI-Powered Article Content Publisher is a comprehensive, AI-powered system designed to aggregate, summarize, and present curated government-related articles. This monorepo, multi-services project is organized into five main components:

  • Backend: Provides a robust RESTful API to store and serve curated articles
  • Crawler: Automatically crawls and extracts article URLs and metadata from government homepages and public API sources
  • Frontend: Offers an intuitive Next.js-based user interface for government staff (and potentially the public) to browse and view article details
  • Newsletter: Sends daily updates to subscribers with the latest articles
  • Agentic AI Pipeline: Sophisticated multi-agent system for advanced content processing using LangGraph and LangChain

πŸ“‹ Table of Contents

πŸ” Overview

The SynthoraAI - AI-Powered Article Content Publisher system is designed to provide government staff with up-to-date, summarized content from trusted government sources and reputable news outlets. By leveraging AI (Google Generative AI / Gemini) for summarization and using modern web technologies, this solution ensures that users receive concise, accurate, and timely information.

Key Capabilities

  • Data Ingestion: The system aggregates article URLs from multiple sources (government homepages and public APIs like NewsAPI) using a decoupled crawler service
  • Content Processing: The backend processes the fetched articles by generating concise summaries via Google Generative AI with robust retry mechanisms
  • Data Storage & API Serving: Articles are stored in MongoDB and exposed via REST endpoints built with Express.js/Next.js
  • Frontend Experience: A responsive Next.js/React interface allows users to browse, filter, and view detailed articles with dark/light mode support
  • Scheduled Updates: Both the backend and crawler employ scheduled serverless functions (via Vercel cron) to periodically update content
  • Newsletter Subscription: Users can subscribe to daily email updates with the latest articles via Resend integration
  • User Authentication: Users can create accounts, log in, and receive JWT tokens for secure access
  • Favorite Articles: Authenticated users can mark articles as favorites for quick access
  • AI-Powered Features:
    • Article Q&A with RAG (Retrieval-Augmented Generation)
    • Vector similarity search for related articles (Pinecone)
    • Bias detection and analysis
    • User ratings and discussions
    • Client-side ML recommendations

πŸ—οΈ Architecture

This project consists of 4 primary microservices that interact with each other:

1. Crawler

  • Crawls government homepages and public API sources to extract article URLs and metadata
  • Uses Axios and Cheerio for static HTML parsing, with Puppeteer as a fallback for dynamic content
  • Scheduled to run daily at 6:00 AM UTC via a serverless function on Vercel
  • Provides a basic landing page with information about the crawler

2. Backend

  • Built with Express.js and Next.js, serving as a RESTful API for the frontend
  • Integrates Google Generative AI (Gemini) for content summarization
  • Stores articles in MongoDB using Mongoose
  • Scheduled serverless function to fetch and process new articles daily at 6:00 AM UTC
  • Supports user authentication with JWT
  • Provides endpoints for articles, favorites, comments, ratings, and more

3. Newsletter Service

  • Allows users to subscribe to a newsletter for daily updates on the latest articles
  • Integrated with Resend API for managing subscriptions and sending emails
  • By default, the newsletter is sent daily at 9:00 AM UTC
  • Deployed on Vercel as a serverless function

4. Frontend

  • Built with Next.js and React, providing a modern, mobile-responsive UI
  • Fetches and displays a paginated list of articles from the backend API
  • Dedicated pages for full article content, AI-generated summaries, and source information
  • User authentication for marking favorites, commenting, and upvoting/downvoting
  • Dark mode support for improved readability

5. Agentic AI Pipeline

  • Multi-agent system built with LangGraph and LangChain
  • Specialized agents for content analysis, summarization, classification, sentiment analysis, and quality assurance
  • MCP (Model Context Protocol) server for standardized AI interactions
  • Cloud-ready with production configs for AWS Lambda and Azure Functions

This monorepo, microservices architecture is designed to be modular and scalable, allowing for easy updates and maintenance. Each component can be developed, tested, and deployed independently.

✨ Features

Core Features

  • AI-Powered Summarization: Google Generative AI (Gemini) generates concise article summaries
  • Multi-Source Aggregation: Crawls multiple government and news sources
  • RESTful API: Comprehensive API for article management
  • User Authentication: JWT-based secure authentication
  • Favorite Articles: Users can save articles for quick access
  • Newsletter Subscription: Daily email updates with latest articles
  • Dark Mode: Toggle between light and dark themes
  • Responsive Design: Optimized for desktop and mobile devices

Advanced AI Features

  • Article Q&A: Ask questions about articles and receive AI-generated answers using RAG
  • Related Articles: Vector similarity search powered by Pinecone
  • Bias Detection: AI-powered article bias analysis
  • Sentiment Analysis: Emotional tone and objectivity analysis
  • User Ratings: Rate articles and provide feedback
  • Discussions & Comments: Engage with other users through comments
  • Upvote/Downvote: Highlight valuable contributions
  • Client-Side ML Recommendations: Personalized article recommendations

Developer Features

  • CLI Tool: Unified command-line interface for managing the entire monorepo
  • Docker Support: Full Docker Compose configuration
  • CI/CD: GitHub Actions workflows for automated testing and deployment
  • Testing Infrastructure: Jest, Playwright, and Supertest for comprehensive testing
  • Logging & Monitoring: Winston for logging, Prometheus for metrics
  • Code Quality: ESLint, Prettier, and Husky for code consistency

πŸ› οΈ Technology Stack

Backend

  • Node.js (v18+)
  • Express.js
  • Next.js (API Routes)
  • MongoDB + Mongoose
  • Google Generative AI (Gemini)
  • JWT Authentication
  • Redis (Caching)
  • Winston (Logging)

Frontend

  • Next.js 14+
  • React 18+
  • TypeScript
  • TailwindCSS
  • Shadcn UI
  • CSS Modules

Crawler

  • Node.js
  • Axios
  • Cheerio
  • Puppeteer
  • Playwright

Newsletter

  • Resend API
  • Nodemailer
  • Serverless Functions

Agentic AI

  • Python 3.11+
  • LangChain
  • LangGraph
  • FastMCP
  • Redis
  • MongoDB
  • Prometheus

DevOps & Tools

  • Docker & Docker Compose
  • Vercel (Deployment)
  • GitHub Actions (CI/CD)
  • Jest (Testing)
  • Playwright (E2E Testing)
  • ESLint & Prettier (Code Quality)
  • Husky (Git Hooks)
  • Makefile (Task Automation)

πŸš€ Getting Started

Prerequisites

  • Node.js (v18 or later)
  • npm (v9 or later)
  • MongoDB (local or cloud instance like MongoDB Atlas)
  • Python 3.11+ (for Agentic AI Pipeline)
  • Docker (optional, for containerized deployment)
  • Vercel CLI (optional, for deployment)

Installation

  1. Clone the Repository:

    git clone https://github.com/your-org/AI-Content-Publisher.git
    cd AI-Content-Publisher
  2. Install Dependencies:

    npm install

    This will install dependencies for all workspaces (backend, frontend, crawler, newsletters).

  3. Set Up Environment Variables:

    cp .env.example .env

    Edit the .env file with your actual configuration values (see Configuration section).

Configuration

Create a .env file in the root directory with the following variables:

# MongoDB Configuration
MONGODB_URI=your_mongodb_connection_string

# Google AI Configuration
GOOGLE_AI_API_KEY=your_google_ai_api_key
AI_INSTRUCTIONS=Summarize the articles concisely and naturally

# News API Configuration
NEWS_API_KEY=your_news_api_key

# Server Configuration
PORT=3000
AICC_API_URL=http://localhost:3000

# Crawler Configuration
CRAWL_URLS=https://www.state.gov/press-releases/,https://www.bbc.com/news
CRAWL_MAX_LINKS=50
CRAWL_TIMEOUT=30000

# Resend Email Configuration
RESEND_API_KEY=your_resend_api_key
RESEND_FROM=AI Curator <[email protected]>

# JWT Configuration
JWT_SECRET=your_jwt_secret_key
JWT_EXPIRES_IN=7d

# Pinecone Configuration
PINECONE_API_KEY=your_pinecone_api_key
PINECONE_ENVIRONMENT=your_pinecone_environment
PINECONE_INDEX=article-embeddings

# Frontend Configuration
NEXT_PUBLIC_API_URL=http://localhost:3000

See .env.example for a complete list of configuration options.

Running Locally

Start All Services:

npm run dev

This will start the backend, frontend, and crawler concurrently.

Start Individual Services:

# Backend only
npm run dev:backend

# Frontend only
npm run dev:frontend

# Crawler only
npm run dev:crawler

# Newsletter only
npm run dev:newsletters

Using Make:

# Start all services
make dev

# Start individual services
make dev-backend
make dev-frontend
make dev-crawler

Access the Application:

πŸ“¦ Services

Backend

The backend provides a RESTful API for managing articles, users, and more.

Key Features:

  • Article CRUD operations
  • User authentication with JWT
  • AI-powered summarization
  • Favorite articles management
  • Comments and ratings
  • Scheduled article fetching

API Endpoints:

  • GET /api/articles - Get paginated list of articles
  • GET /api/articles/:id - Get article details
  • POST /api/articles - Create new article
  • PUT /api/articles/:id - Update article
  • DELETE /api/articles/:id - Delete article
  • POST /api/auth/register - User registration
  • POST /api/auth/login - User login
  • GET /api/favorites - Get user favorites
  • POST /api/favorites/:id - Add to favorites

Running the Backend:

cd backend
npm install
npm run dev

See backend/README.md for detailed documentation.

Frontend

The frontend provides a modern, responsive UI for browsing and viewing articles.

Key Features:

  • Article listing with pagination
  • Article detail view with AI summary
  • User authentication (login/register)
  • Favorite articles
  • Dark mode support
  • Newsletter subscription
  • Article Q&A
  • Related articles
  • Bias analysis
  • Comments and ratings

Running the Frontend:

cd frontend
npm install
npm run dev

See frontend/README.md for detailed documentation.

Crawler

The crawler automatically fetches articles from configured sources.

Key Features:

  • Multi-source crawling
  • Static and dynamic content support
  • Scheduled execution via Vercel cron
  • Error handling and retry logic
  • Metadata extraction

Running the Crawler:

cd crawler
npm install
npm run crawl

See crawler/README.md for detailed documentation.

Newsletter

The newsletter service sends daily email updates to subscribers.

Key Features:

  • Email subscription management
  • Daily newsletter sending
  • Unsubscribe functionality
  • Integration with Resend API

Running the Newsletter:

cd newsletters
npm install
npm run dev

See newsletters/README.md for detailed documentation.

Agentic AI Pipeline

A sophisticated multi-agent system for advanced content processing.

Key Features:

  • Multi-agent architecture with specialized agents
  • Content analysis and entity extraction
  • Advanced summarization
  • Topic classification
  • Sentiment analysis
  • Quality assurance with retry logic
  • MCP server for standardized AI interactions
  • Cloud-ready deployment

Getting Started:

cd agentic_ai
pip install -r requirements.txt
python -m agentic_ai.mcp_server.server

See agentic_ai/README.md for detailed documentation.

🌐 Deployment

Vercel Deployment

All services can be deployed to Vercel:

# Deploy all services
vercel --prod

# Deploy individual services
cd backend && vercel --prod
cd frontend && vercel --prod
cd crawler && vercel --prod
cd newsletters && vercel --prod

Docker Deployment

Use Docker Compose for local or production deployment:

# Start all services
docker-compose up -d

# Build images
docker-compose build

# Stop services
docker-compose down

AWS/Azure Deployment

The Agentic AI Pipeline includes production configurations for AWS Lambda and Azure Functions. See agentic_ai/aws/ and agentic_ai/azure/ directories for deployment scripts.

πŸ”§ CLI Tool

The aicc CLI provides a unified interface for managing the entire monorepo:

Installation

npm install
npm link

Usage

Workspace Management:

aicc dev              # Start all services
aicc dev backend      # Start backend only
aicc build            # Build all services
aicc start            # Start in production mode
aicc lint             # Lint all code

Crawling:

aicc crawl            # Run the crawler

Article CRUD:

aicc article create --title "Title" --content "Content"
aicc article get <id>
aicc article list --limit 10
aicc article update <id> --title "New Title"
aicc article delete <id>

See the CLI Documentation for more details.

πŸ§ͺ Testing

Backend Tests

cd backend
npm run test              # Run tests once
npm run test:watch        # Watch mode
npm run test:coverage     # Generate coverage report

Frontend Tests

cd frontend
npm run test:e2e          # Run E2E tests (headless)
npm run test:e2e:headed   # Run E2E tests (headed)
npm run test:e2e:report   # View test report

Crawler Tests

cd crawler
npm run test

Run All Tests

npm run test              # From root
make test                 # Using Makefile

🀝 Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the repository
  2. Create a new branch (git checkout -b feature/your-feature)
  3. Make your changes
  4. Run tests and linting (npm run test && npm run lint)
  5. Commit your changes (git commit -m 'Add some feature')
  6. Push to the branch (git push origin feature/your-feature)
  7. Open a Pull Request

Please ensure your code follows the project's coding standards and includes appropriate tests.

πŸ“„ License

This project is licensed under the MIT License. See the LICENSE file for details.

πŸ“ž Contact

For questions, suggestions, or support:


Built with ❀️ by the SynthoraAI Team

πŸ” Back to Top

About

AI-powered system designed to aggregate, summarize, and present curated articles

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published