SynthoraAI - Synthesizing the world's news & information through AI. πβ¨
The SynthoraAI - AI-Powered Article Content Publisher is a comprehensive, AI-powered system designed to aggregate, summarize, and present curated government-related articles. This monorepo, multi-services project is organized into five main components:
- Backend: Provides a robust RESTful API to store and serve curated articles
- Crawler: Automatically crawls and extracts article URLs and metadata from government homepages and public API sources
- Frontend: Offers an intuitive Next.js-based user interface for government staff (and potentially the public) to browse and view article details
- Newsletter: Sends daily updates to subscribers with the latest articles
- Agentic AI Pipeline: Sophisticated multi-agent system for advanced content processing using LangGraph and LangChain
- Overview
- Architecture
- Features
- Technology Stack
- Getting Started
- Services
- Deployment
- CLI Tool
- Testing
- Contributing
- License
The SynthoraAI - AI-Powered Article Content Publisher system is designed to provide government staff with up-to-date, summarized content from trusted government sources and reputable news outlets. By leveraging AI (Google Generative AI / Gemini) for summarization and using modern web technologies, this solution ensures that users receive concise, accurate, and timely information.
- Data Ingestion: The system aggregates article URLs from multiple sources (government homepages and public APIs like NewsAPI) using a decoupled crawler service
- Content Processing: The backend processes the fetched articles by generating concise summaries via Google Generative AI with robust retry mechanisms
- Data Storage & API Serving: Articles are stored in MongoDB and exposed via REST endpoints built with Express.js/Next.js
- Frontend Experience: A responsive Next.js/React interface allows users to browse, filter, and view detailed articles with dark/light mode support
- Scheduled Updates: Both the backend and crawler employ scheduled serverless functions (via Vercel cron) to periodically update content
- Newsletter Subscription: Users can subscribe to daily email updates with the latest articles via Resend integration
- User Authentication: Users can create accounts, log in, and receive JWT tokens for secure access
- Favorite Articles: Authenticated users can mark articles as favorites for quick access
- AI-Powered Features:
- Article Q&A with RAG (Retrieval-Augmented Generation)
- Vector similarity search for related articles (Pinecone)
- Bias detection and analysis
- User ratings and discussions
- Client-side ML recommendations
This project consists of 4 primary microservices that interact with each other:
- Crawls government homepages and public API sources to extract article URLs and metadata
- Uses Axios and Cheerio for static HTML parsing, with Puppeteer as a fallback for dynamic content
- Scheduled to run daily at 6:00 AM UTC via a serverless function on Vercel
- Provides a basic landing page with information about the crawler
- Built with Express.js and Next.js, serving as a RESTful API for the frontend
- Integrates Google Generative AI (Gemini) for content summarization
- Stores articles in MongoDB using Mongoose
- Scheduled serverless function to fetch and process new articles daily at 6:00 AM UTC
- Supports user authentication with JWT
- Provides endpoints for articles, favorites, comments, ratings, and more
- Allows users to subscribe to a newsletter for daily updates on the latest articles
- Integrated with Resend API for managing subscriptions and sending emails
- By default, the newsletter is sent daily at 9:00 AM UTC
- Deployed on Vercel as a serverless function
- Built with Next.js and React, providing a modern, mobile-responsive UI
- Fetches and displays a paginated list of articles from the backend API
- Dedicated pages for full article content, AI-generated summaries, and source information
- User authentication for marking favorites, commenting, and upvoting/downvoting
- Dark mode support for improved readability
- Multi-agent system built with LangGraph and LangChain
- Specialized agents for content analysis, summarization, classification, sentiment analysis, and quality assurance
- MCP (Model Context Protocol) server for standardized AI interactions
- Cloud-ready with production configs for AWS Lambda and Azure Functions
This monorepo, microservices architecture is designed to be modular and scalable, allowing for easy updates and maintenance. Each component can be developed, tested, and deployed independently.
- AI-Powered Summarization: Google Generative AI (Gemini) generates concise article summaries
- Multi-Source Aggregation: Crawls multiple government and news sources
- RESTful API: Comprehensive API for article management
- User Authentication: JWT-based secure authentication
- Favorite Articles: Users can save articles for quick access
- Newsletter Subscription: Daily email updates with latest articles
- Dark Mode: Toggle between light and dark themes
- Responsive Design: Optimized for desktop and mobile devices
- Article Q&A: Ask questions about articles and receive AI-generated answers using RAG
- Related Articles: Vector similarity search powered by Pinecone
- Bias Detection: AI-powered article bias analysis
- Sentiment Analysis: Emotional tone and objectivity analysis
- User Ratings: Rate articles and provide feedback
- Discussions & Comments: Engage with other users through comments
- Upvote/Downvote: Highlight valuable contributions
- Client-Side ML Recommendations: Personalized article recommendations
- CLI Tool: Unified command-line interface for managing the entire monorepo
- Docker Support: Full Docker Compose configuration
- CI/CD: GitHub Actions workflows for automated testing and deployment
- Testing Infrastructure: Jest, Playwright, and Supertest for comprehensive testing
- Logging & Monitoring: Winston for logging, Prometheus for metrics
- Code Quality: ESLint, Prettier, and Husky for code consistency
- Node.js (v18+)
- Express.js
- Next.js (API Routes)
- MongoDB + Mongoose
- Google Generative AI (Gemini)
- JWT Authentication
- Redis (Caching)
- Winston (Logging)
- Next.js 14+
- React 18+
- TypeScript
- TailwindCSS
- Shadcn UI
- CSS Modules
- Node.js
- Axios
- Cheerio
- Puppeteer
- Playwright
- Resend API
- Nodemailer
- Serverless Functions
- Python 3.11+
- LangChain
- LangGraph
- FastMCP
- Redis
- MongoDB
- Prometheus
- Docker & Docker Compose
- Vercel (Deployment)
- GitHub Actions (CI/CD)
- Jest (Testing)
- Playwright (E2E Testing)
- ESLint & Prettier (Code Quality)
- Husky (Git Hooks)
- Makefile (Task Automation)
- Node.js (v18 or later)
- npm (v9 or later)
- MongoDB (local or cloud instance like MongoDB Atlas)
- Python 3.11+ (for Agentic AI Pipeline)
- Docker (optional, for containerized deployment)
- Vercel CLI (optional, for deployment)
-
Clone the Repository:
git clone https://github.com/your-org/AI-Content-Publisher.git cd AI-Content-Publisher -
Install Dependencies:
npm install
This will install dependencies for all workspaces (backend, frontend, crawler, newsletters).
-
Set Up Environment Variables:
cp .env.example .env
Edit the
.envfile with your actual configuration values (see Configuration section).
Create a .env file in the root directory with the following variables:
# MongoDB Configuration
MONGODB_URI=your_mongodb_connection_string
# Google AI Configuration
GOOGLE_AI_API_KEY=your_google_ai_api_key
AI_INSTRUCTIONS=Summarize the articles concisely and naturally
# News API Configuration
NEWS_API_KEY=your_news_api_key
# Server Configuration
PORT=3000
AICC_API_URL=http://localhost:3000
# Crawler Configuration
CRAWL_URLS=https://www.state.gov/press-releases/,https://www.bbc.com/news
CRAWL_MAX_LINKS=50
CRAWL_TIMEOUT=30000
# Resend Email Configuration
RESEND_API_KEY=your_resend_api_key
RESEND_FROM=AI Curator <[email protected]>
# JWT Configuration
JWT_SECRET=your_jwt_secret_key
JWT_EXPIRES_IN=7d
# Pinecone Configuration
PINECONE_API_KEY=your_pinecone_api_key
PINECONE_ENVIRONMENT=your_pinecone_environment
PINECONE_INDEX=article-embeddings
# Frontend Configuration
NEXT_PUBLIC_API_URL=http://localhost:3000See .env.example for a complete list of configuration options.
npm run devThis will start the backend, frontend, and crawler concurrently.
# Backend only
npm run dev:backend
# Frontend only
npm run dev:frontend
# Crawler only
npm run dev:crawler
# Newsletter only
npm run dev:newsletters# Start all services
make dev
# Start individual services
make dev-backend
make dev-frontend
make dev-crawler- Frontend: http://localhost:3000
- Backend API: http://localhost:3000/api
- Crawler: http://localhost:3002
- Newsletter: http://localhost:3003
The backend provides a RESTful API for managing articles, users, and more.
Key Features:
- Article CRUD operations
- User authentication with JWT
- AI-powered summarization
- Favorite articles management
- Comments and ratings
- Scheduled article fetching
API Endpoints:
GET /api/articles- Get paginated list of articlesGET /api/articles/:id- Get article detailsPOST /api/articles- Create new articlePUT /api/articles/:id- Update articleDELETE /api/articles/:id- Delete articlePOST /api/auth/register- User registrationPOST /api/auth/login- User loginGET /api/favorites- Get user favoritesPOST /api/favorites/:id- Add to favorites
Running the Backend:
cd backend
npm install
npm run devSee backend/README.md for detailed documentation.
The frontend provides a modern, responsive UI for browsing and viewing articles.
Key Features:
- Article listing with pagination
- Article detail view with AI summary
- User authentication (login/register)
- Favorite articles
- Dark mode support
- Newsletter subscription
- Article Q&A
- Related articles
- Bias analysis
- Comments and ratings
Running the Frontend:
cd frontend
npm install
npm run devSee frontend/README.md for detailed documentation.
The crawler automatically fetches articles from configured sources.
Key Features:
- Multi-source crawling
- Static and dynamic content support
- Scheduled execution via Vercel cron
- Error handling and retry logic
- Metadata extraction
Running the Crawler:
cd crawler
npm install
npm run crawlSee crawler/README.md for detailed documentation.
The newsletter service sends daily email updates to subscribers.
Key Features:
- Email subscription management
- Daily newsletter sending
- Unsubscribe functionality
- Integration with Resend API
Running the Newsletter:
cd newsletters
npm install
npm run devSee newsletters/README.md for detailed documentation.
A sophisticated multi-agent system for advanced content processing.
Key Features:
- Multi-agent architecture with specialized agents
- Content analysis and entity extraction
- Advanced summarization
- Topic classification
- Sentiment analysis
- Quality assurance with retry logic
- MCP server for standardized AI interactions
- Cloud-ready deployment
Getting Started:
cd agentic_ai
pip install -r requirements.txt
python -m agentic_ai.mcp_server.serverSee agentic_ai/README.md for detailed documentation.
All services can be deployed to Vercel:
# Deploy all services
vercel --prod
# Deploy individual services
cd backend && vercel --prod
cd frontend && vercel --prod
cd crawler && vercel --prod
cd newsletters && vercel --prodUse Docker Compose for local or production deployment:
# Start all services
docker-compose up -d
# Build images
docker-compose build
# Stop services
docker-compose downThe Agentic AI Pipeline includes production configurations for AWS Lambda and Azure Functions. See agentic_ai/aws/ and agentic_ai/azure/ directories for deployment scripts.
The aicc CLI provides a unified interface for managing the entire monorepo:
npm install
npm linkWorkspace Management:
aicc dev # Start all services
aicc dev backend # Start backend only
aicc build # Build all services
aicc start # Start in production mode
aicc lint # Lint all codeCrawling:
aicc crawl # Run the crawlerArticle CRUD:
aicc article create --title "Title" --content "Content"
aicc article get <id>
aicc article list --limit 10
aicc article update <id> --title "New Title"
aicc article delete <id>See the CLI Documentation for more details.
cd backend
npm run test # Run tests once
npm run test:watch # Watch mode
npm run test:coverage # Generate coverage reportcd frontend
npm run test:e2e # Run E2E tests (headless)
npm run test:e2e:headed # Run E2E tests (headed)
npm run test:e2e:report # View test reportcd crawler
npm run testnpm run test # From root
make test # Using MakefileContributions are welcome! Please follow these steps:
- Fork the repository
- Create a new branch (
git checkout -b feature/your-feature) - Make your changes
- Run tests and linting (
npm run test && npm run lint) - Commit your changes (
git commit -m 'Add some feature') - Push to the branch (
git push origin feature/your-feature) - Open a Pull Request
Please ensure your code follows the project's coding standards and includes appropriate tests.
This project is licensed under the MIT License. See the LICENSE file for details.
For questions, suggestions, or support:
- Email: [email protected]
- GitHub: @hoangsonww
- Website: sonnguyenhoang.com
Built with β€οΈ by the SynthoraAI Team
π Back to Top