SynthoraAI - AI-Powered Article Content Publisher

SynthoraAI - Synthesizing the world's news & information through AI. 🚀✨

The SynthoraAI - AI-Powered Article Content Publisher is a comprehensive, AI-powered system designed to aggregate, summarize, and present curated government-related articles. This monorepo, multi-services project is organized into five main components:

Backend: Provides a robust RESTful API to store and serve curated articles
Crawler: Automatically crawls and extracts article URLs and metadata from government homepages and public API sources
Frontend: Offers an intuitive Next.js-based user interface for government staff (and potentially the public) to browse and view article details
Newsletter: Sends daily updates to subscribers with the latest articles
Agentic AI Pipeline: Sophisticated multi-agent system for advanced content processing using LangGraph and LangChain

🔍 Overview

The SynthoraAI - AI-Powered Article Content Publisher system is designed to provide government staff with up-to-date, summarized content from trusted government sources and reputable news outlets. By leveraging AI (Google Generative AI / Gemini) for summarization and using modern web technologies, this solution ensures that users receive concise, accurate, and timely information.

Key Capabilities

Data Ingestion: The system aggregates article URLs from multiple sources (government homepages and public APIs like NewsAPI) using a decoupled crawler service
Content Processing: The backend processes the fetched articles by generating concise summaries via Google Generative AI with robust retry mechanisms
Data Storage & API Serving: Articles are stored in MongoDB and exposed via REST endpoints built with Express.js/Next.js
Frontend Experience: A responsive Next.js/React interface allows users to browse, filter, and view detailed articles with dark/light mode support
Scheduled Updates: Both the backend and crawler employ scheduled serverless functions (via Vercel cron) to periodically update content
Newsletter Subscription: Users can subscribe to daily email updates with the latest articles via Resend integration
User Authentication: Users can create accounts, log in, and receive JWT tokens for secure access
Favorite Articles: Authenticated users can mark articles as favorites for quick access
AI-Powered Features:
- Article Q&A with RAG (Retrieval-Augmented Generation)
- Vector similarity search for related articles (Pinecone)
- Bias detection and analysis
- User ratings and discussions
- Client-side ML recommendations

🏗️ Architecture

This project consists of 4 primary microservices that interact with each other:

1. Crawler

Crawls government homepages and public API sources to extract article URLs and metadata
Uses Axios and Cheerio for static HTML parsing, with Puppeteer as a fallback for dynamic content
Scheduled to run daily at 6:00 AM UTC via a serverless function on Vercel
Provides a basic landing page with information about the crawler

2. Backend

Built with Express.js and Next.js, serving as a RESTful API for the frontend
Integrates Google Generative AI (Gemini) for content summarization
Stores articles in MongoDB using Mongoose
Scheduled serverless function to fetch and process new articles daily at 6:00 AM UTC
Supports user authentication with JWT
Provides endpoints for articles, favorites, comments, ratings, and more

3. Newsletter Service

Allows users to subscribe to a newsletter for daily updates on the latest articles
Integrated with Resend API for managing subscriptions and sending emails
By default, the newsletter is sent daily at 9:00 AM UTC
Deployed on Vercel as a serverless function

4. Frontend

Built with Next.js and React, providing a modern, mobile-responsive UI
Fetches and displays a paginated list of articles from the backend API
Dedicated pages for full article content, AI-generated summaries, and source information
User authentication for marking favorites, commenting, and upvoting/downvoting
Dark mode support for improved readability

5. Agentic AI Pipeline

Multi-agent system built with LangGraph and LangChain
Specialized agents for content analysis, summarization, classification, sentiment analysis, and quality assurance
MCP (Model Context Protocol) server for standardized AI interactions
Cloud-ready with production configs for AWS Lambda and Azure Functions

This monorepo, microservices architecture is designed to be modular and scalable, allowing for easy updates and maintenance. Each component can be developed, tested, and deployed independently.

✨ Features

Core Features

AI-Powered Summarization: Google Generative AI (Gemini) generates concise article summaries
Multi-Source Aggregation: Crawls multiple government and news sources
RESTful API: Comprehensive API for article management
User Authentication: JWT-based secure authentication
Favorite Articles: Users can save articles for quick access
Newsletter Subscription: Daily email updates with latest articles
Dark Mode: Toggle between light and dark themes
Responsive Design: Optimized for desktop and mobile devices

Advanced AI Features

Article Q&A: Ask questions about articles and receive AI-generated answers using RAG
Related Articles: Vector similarity search powered by Pinecone
Bias Detection: AI-powered article bias analysis
Sentiment Analysis: Emotional tone and objectivity analysis
User Ratings: Rate articles and provide feedback
Discussions & Comments: Engage with other users through comments
Upvote/Downvote: Highlight valuable contributions
Client-Side ML Recommendations: Personalized article recommendations

Developer Features

CLI Tool: Unified command-line interface for managing the entire monorepo
Docker Support: Full Docker Compose configuration
CI/CD: GitHub Actions workflows for automated testing and deployment
Testing Infrastructure: Jest, Playwright, and Supertest for comprehensive testing
Logging & Monitoring: Winston for logging, Prometheus for metrics
Code Quality: ESLint, Prettier, and Husky for code consistency

🛠️ Technology Stack

Backend

Node.js (v18+)
Express.js
Next.js (API Routes)
MongoDB + Mongoose
Google Generative AI (Gemini)
JWT Authentication
Redis (Caching)
Winston (Logging)

Frontend

Next.js 14+
React 18+
TypeScript
TailwindCSS
Shadcn UI
CSS Modules

Crawler

Node.js
Axios
Cheerio
Puppeteer
Playwright

Newsletter

Resend API
Nodemailer
Serverless Functions

Agentic AI

Python 3.11+
LangChain
LangGraph
FastMCP
Redis
MongoDB
Prometheus

DevOps & Tools

Docker & Docker Compose
Vercel (Deployment)
GitHub Actions (CI/CD)
Jest (Testing)
Playwright (E2E Testing)
ESLint & Prettier (Code Quality)
Husky (Git Hooks)
Makefile (Task Automation)

🚀 Getting Started

Prerequisites

Node.js (v18 or later)
npm (v9 or later)
MongoDB (local or cloud instance like MongoDB Atlas)
Python 3.11+ (for Agentic AI Pipeline)
Docker (optional, for containerized deployment)
Vercel CLI (optional, for deployment)

Installation

Clone the Repository:

git clone https://github.com/your-org/AI-Content-Publisher.git
cd AI-Content-Publisher

Install Dependencies:
```
npm install
```
This will install dependencies for all workspaces (backend, frontend, crawler, newsletters).
Set Up Environment Variables:
```
cp .env.example .env
```
Edit the .env file with your actual configuration values (see Configuration section).

Configuration

Create a .env file in the root directory with the following variables:

# MongoDB Configuration
MONGODB_URI=your_mongodb_connection_string

# Google AI Configuration
GOOGLE_AI_API_KEY=your_google_ai_api_key
AI_INSTRUCTIONS=Summarize the articles concisely and naturally

# News API Configuration
NEWS_API_KEY=your_news_api_key

# Server Configuration
PORT=3000
AICC_API_URL=http://localhost:3000

# Crawler Configuration
CRAWL_URLS=https://www.state.gov/press-releases/,https://www.bbc.com/news
CRAWL_MAX_LINKS=50
CRAWL_TIMEOUT=30000

# Resend Email Configuration
RESEND_API_KEY=your_resend_api_key
RESEND_FROM=AI Curator <[email protected]>

# JWT Configuration
JWT_SECRET=your_jwt_secret_key
JWT_EXPIRES_IN=7d

# Pinecone Configuration
PINECONE_API_KEY=your_pinecone_api_key
PINECONE_ENVIRONMENT=your_pinecone_environment
PINECONE_INDEX=article-embeddings

# Frontend Configuration
NEXT_PUBLIC_API_URL=http://localhost:3000

See .env.example for a complete list of configuration options.

Running Locally

Start All Services:

npm run dev

This will start the backend, frontend, and crawler concurrently.

Start Individual Services:

# Backend only
npm run dev:backend

# Frontend only
npm run dev:frontend

# Crawler only
npm run dev:crawler

# Newsletter only
npm run dev:newsletters

Using Make:

# Start all services
make dev

# Start individual services
make dev-backend
make dev-frontend
make dev-crawler

Access the Application:

Frontend: http://localhost:3000
Backend API: http://localhost:3000/api
Crawler: http://localhost:3002
Newsletter: http://localhost:3003

📦 Services

Backend

The backend provides a RESTful API for managing articles, users, and more.

Key Features:

Article CRUD operations
User authentication with JWT
AI-powered summarization
Favorite articles management
Comments and ratings
Scheduled article fetching

API Endpoints:

GET /api/articles - Get paginated list of articles
GET /api/articles/:id - Get article details
POST /api/articles - Create new article
PUT /api/articles/:id - Update article
DELETE /api/articles/:id - Delete article
POST /api/auth/register - User registration
POST /api/auth/login - User login
GET /api/favorites - Get user favorites
POST /api/favorites/:id - Add to favorites

Running the Backend:

cd backend
npm install
npm run dev

See backend/README.md for detailed documentation.

Frontend

The frontend provides a modern, responsive UI for browsing and viewing articles.

Key Features:

Article listing with pagination
Article detail view with AI summary
User authentication (login/register)
Favorite articles
Dark mode support
Newsletter subscription
Article Q&A
Related articles
Bias analysis
Comments and ratings

Running the Frontend:

cd frontend
npm install
npm run dev

See frontend/README.md for detailed documentation.

Crawler

The crawler automatically fetches articles from configured sources.

Key Features:

Multi-source crawling
Static and dynamic content support
Scheduled execution via Vercel cron
Error handling and retry logic
Metadata extraction

Running the Crawler:

cd crawler
npm install
npm run crawl

See crawler/README.md for detailed documentation.

Newsletter

The newsletter service sends daily email updates to subscribers.

Key Features:

Email subscription management
Daily newsletter sending
Unsubscribe functionality
Integration with Resend API

Running the Newsletter:

cd newsletters
npm install
npm run dev

See newsletters/README.md for detailed documentation.

Agentic AI Pipeline

A sophisticated multi-agent system for advanced content processing.

Key Features:

Multi-agent architecture with specialized agents
Content analysis and entity extraction
Advanced summarization
Topic classification
Sentiment analysis
Quality assurance with retry logic
MCP server for standardized AI interactions
Cloud-ready deployment

Getting Started:

cd agentic_ai
pip install -r requirements.txt
python -m agentic_ai.mcp_server.server

See agentic_ai/README.md for detailed documentation.

🌐 Deployment

Vercel Deployment

All services can be deployed to Vercel:

# Deploy all services
vercel --prod

# Deploy individual services
cd backend && vercel --prod
cd frontend && vercel --prod
cd crawler && vercel --prod
cd newsletters && vercel --prod

Docker Deployment

Use Docker Compose for local or production deployment:

# Start all services
docker-compose up -d

# Build images
docker-compose build

# Stop services
docker-compose down

AWS/Azure Deployment

The Agentic AI Pipeline includes production configurations for AWS Lambda and Azure Functions. See agentic_ai/aws/ and agentic_ai/azure/ directories for deployment scripts.

🔧 CLI Tool

The aicc CLI provides a unified interface for managing the entire monorepo:

Installation

npm install
npm link

Usage

Workspace Management:

aicc dev              # Start all services
aicc dev backend      # Start backend only
aicc build            # Build all services
aicc start            # Start in production mode
aicc lint             # Lint all code

Crawling:

aicc crawl            # Run the crawler

Article CRUD:

aicc article create --title "Title" --content "Content"
aicc article get <id>
aicc article list --limit 10
aicc article update <id> --title "New Title"
aicc article delete <id>

See the CLI Documentation for more details.

🧪 Testing

Backend Tests

cd backend
npm run test              # Run tests once
npm run test:watch        # Watch mode
npm run test:coverage     # Generate coverage report

Frontend Tests

cd frontend
npm run test:e2e          # Run E2E tests (headless)
npm run test:e2e:headed   # Run E2E tests (headed)
npm run test:e2e:report   # View test report

Crawler Tests

cd crawler
npm run test

Run All Tests

npm run test              # From root
make test                 # Using Makefile

🤝 Contributing

Contributions are welcome! Please follow these steps:

Fork the repository
Create a new branch (git checkout -b feature/your-feature)
Make your changes
Run tests and linting (npm run test && npm run lint)
Commit your changes (git commit -m 'Add some feature')
Push to the branch (git push origin feature/your-feature)
Open a Pull Request

Please ensure your code follows the project's coding standards and includes appropriate tests.

📄 License

This project is licensed under the MIT License. See the LICENSE file for details.

📞 Contact

For questions, suggestions, or support:

Email: [email protected]
GitHub: @hoangsonww
Website: sonnguyenhoang.com

Built with ❤️ by the SynthoraAI Team

🔝 Back to Top

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
agentic_ai		agentic_ai
backend		backend
bin		bin
crawler		crawler
frontend		frontend
newsletters		newsletters
shell		shell
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.prettierignore		.prettierignore
.prettierrc		.prettierrc
ARCHITECTURE.md		ARCHITECTURE.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
package.json		package.json

License

SynthoraAI-AI-News-Content-Curator/AI-Content-Publisher

Folders and files

Latest commit

History

Repository files navigation

SynthoraAI - AI-Powered Article Content Publisher

📋 Table of Contents

🔍 Overview

Key Capabilities

🏗️ Architecture

1. Crawler

2. Backend

3. Newsletter Service

4. Frontend

5. Agentic AI Pipeline

✨ Features

Core Features

Advanced AI Features

Developer Features

🛠️ Technology Stack

Backend

Frontend

Crawler

Newsletter

Agentic AI

DevOps & Tools

🚀 Getting Started

Prerequisites

Installation

Configuration

Running Locally

Start All Services:

Start Individual Services:

Using Make:

Access the Application:

📦 Services

Backend

Frontend

Crawler

Newsletter

Agentic AI Pipeline

🌐 Deployment

Vercel Deployment

Docker Deployment

AWS/Azure Deployment

🔧 CLI Tool

Installation

Usage

🧪 Testing

Backend Tests

Frontend Tests

Crawler Tests

Run All Tests

🤝 Contributing

📄 License

📞 Contact

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages