This document provides a comprehensive overview of all MassGen case studies, organized by category and version.
MassGen is focused on case-driven development. Each case study demonstrates real-world multi-agent collaboration on complex tasks, with actual session logs and outcomes. All case studies follow the PLANNING → TESTING → EVALUATION cycle and include video demonstrations where available.
| Title | Version | Short Description | Status | Link |
|---|---|---|---|---|
| Session Management & Computer Use Tools | v0.1.9 | Complete session state tracking and restoration for multi-turn conversations, computer use automation tools (Claude/Gemini/OpenAI) for browser and desktop control, enhanced config builder with fuzzy model matching | ✅ Ready | 📄 Guide |
| Automation Mode Enables Meta Self-Analysis | v0.1.8 | Automation infrastructure with --automation flag providing clean structured output, enabling agents to run nested MassGen experiments and analyze results for meta-level self-analysis |
✅ Ready | 📄 Case Study · 🎥 Video |
| Agent Task Planning & Background Execution | v0.1.7 | MCP-based task management with dependency tracking, background shell execution for long-running commands, and preemption-based coordination for improved multi-agent workflows | ✅ Ready | Documentation in v0.1.7 changelog |
| Persistent Memory with Semantic Retrieval | v0.1.5 | Research-to-implementation workflow demonstrating memory system with automatic fact extraction, vector storage, and semantic retrieval across multi-turn sessions | ✅ Ready | 📄 Case Study · 🎥 Video |
| Multimodal Video Analysis | v0.1.3 | Meta-level demonstration where agents autonomously download and analyze their own case study videos to identify improvements and automation opportunities | ✅ Ready | 📄 Case Study · 🎥 Video |
| Custom Tools with GitHub Issue Market Analysis | v0.1.1 | Self-evolution through market analysis using custom Python tools combined with web search to analyze GitHub issues, research trends, and drive feature prioritization | ✅ Ready | 📄 Case Study |
| Universal Code Execution via MCP | v0.0.31 | Universal code execution capabilities through MCP enabling agents across all backends to run commands, execute tests, and validate code (pytest, uv run, npm test) | ✅ Ready | 📄 Case Study |
| MCP Planning Mode for Safe Tool Coordination | v0.0.29 | Strategic coordination approach allowing agents to plan MCP tool usage without execution during collaboration, preventing irreversible actions until consensus | ✅ Ready | 📄 Case Study |
| AG2 Framework Integration | v0.0.28 | External agent adapter system enabling MassGen to orchestrate agents from AG2 framework with code execution capabilities while maintaining consensus architecture | ✅ Ready | 📄 Case Study · 🎥 Video |
| Multi-Turn Filesystem Support | v0.0.25 | Multi-turn filesystem support with persistent context enabling agents to build websites iteratively (Bob Dylan tribute site example) | ✅ Ready | 📄 Case Study |
| Advanced Filesystem with User Context Path Support | v0.0.21-v0.0.22 | Advanced filesystem permissions with user context paths, copy MCP integration, and selective path exposure for secure multi-agent workspace collaboration | ✅ Ready | 📄 Case Study · 🎥 Video |
| Unified Filesystem Support with MCP Integration | v0.0.16 | Unified filesystem capabilities demonstrating cross-workspace coordination, conflict-free development with per-agent versioning, and final workspace snapshots | ✅ Ready | 📄 Case Study |
| Gemini MCP Notion Integration | v0.0.15 | Integration with Notion via MCP demonstrating seamless third-party tool integration for knowledge management and documentation workflows | ✅ Ready | 📄 Case Study |
| Enhanced Logging and Workspace Management | v0.0.12-v0.0.14 | Enhanced logging capabilities and workspace management for better debugging, session tracking, and coordination history analysis | ✅ Ready | 📄 Case Study |
| Title | Version | Short Description | Status | Link |
|---|---|---|---|---|
| Berkeley Agentic AI Summit Summary | v0.0.3 | Agents handle specialized research queries with strict source constraints, demonstrating precise adherence to academic standards and framework-specific talk analysis | ✅ Ready | 📄 Case Study · 🎥 Video |
| AI News Synthesis | v0.0.3 | Cross-verification and content aggregation excellence demonstrating how agents synthesize diverse AI news sources with fact-checking and consensus building | ✅ Ready | 📄 Case Study |
| Grok-4 HLE Benchmark Cost Analysis | v0.0.3 | Unanimous expert consensus on complex pricing calculations through iterative refinement, demonstrating collaborative validation for technical analysis | ✅ Ready | 📄 Case Study |
| Title | Version | Short Description | Status | Link |
|---|---|---|---|---|
| Stockholm Travel Guide | v0.0.3 | Extended intelligence sharing and comprehensive convergence where agents collaborate to create detailed travel recommendations with diverse perspectives | ✅ Ready | 📄 Case Study |
| Title | Version | Short Description | Status | Link |
|---|---|---|---|---|
| Super Intelligence Approaches | v0.0.4 | Complex philosophical and technical question exploration leveraging different reasoning capacities (minimal, medium, high) for comprehensive analysis | ✅ Ready | 📄 Case Study · 🎥 Video |
| Comprehensive Algorithm Enumeration | v0.0.4 | Technical analysis demonstrating how agents collaboratively enumerate and compare different algorithmic approaches (Fibonacci algorithms) | ✅ Ready | 📄 Case Study |
| IMO 2025 AI Winners | v0.0.3 | Agents tackle International Mathematical Olympiad problems demonstrating collaborative mathematical reasoning and problem-solving capabilities | ✅ Ready | 📄 Case Study |
| Collaborative Creative Writing | v0.0.3 | Multi-agent creative writing collaboration showcasing diverse narrative perspectives and consensus-driven story development | ✅ Ready | 📄 Case Study |
| Title | Version | Short Description | Status | Link |
|---|---|---|---|---|
| Agent Adapter System | Future | Unified agent interface for easier backend integration, enabling seamless integration of new agent frameworks | 🔄 In Progress | 📄 Case Study · ROADMAP.md |
| Human-in-the-Loop Safety for Irreversible Actions | Future | Human approval mechanism for dangerous operations (file deletion, system commands, API calls), preventing accidental damage while maintaining agent autonomy | 🔄 In Progress | 📄 Case Study · ROADMAP.md |
| Automatic MCP Tool Selection & NLIP | v0.1.13 | Automatic MCP tool selection based on task requirements, dynamic tool refinement during execution, NLIP integration for enhanced agent coordination with hierarchy initialization | 📋 Planned | 📄 Case Study · Target: Nov 17, 2025 |
| Terminal Evaluation & Automated Case Study Generation | v0.1.14 | MassGen terminal evaluation and self-improvement, terminal session recording using asciinema, automated case study generation from terminal recordings, video editing integration | 📋 Planned | 📄 Case Study · Target: Nov 19, 2025 |
| Parallel File Operations & Docker Isolation | v0.1.15 | Parallel file operations for improved performance, standard efficiency evaluation and benchmarking methodology, custom tools running in isolated Docker containers for enhanced security and portability | 📋 Planned | 📄 Case Study · Target: Nov 21, 2025 |
| Web UI Development - Collaborative Design | v0.1.x | Three agents competitively build complete dashboard implementations with peer review and voting. Demonstrated production-ready output in 12 minutes with unanimous consensus | ✅ Completed | 📄 Draft PR · Session: log_20251025_222849 |
| Interactive Course Generator | v0.1.x | 5-agent sequential pipeline transforming PDFs/textbooks into interactive courses with Q&A, drag-and-match exercises, flowcharts, and code examples | 📝 Planning | 📄 Quick Reference |
| Codebase Architecture Analysis | v0.1.x | Multi-agent collaborative analysis of large codebases (FastAPI) creating comprehensive architecture documentation by reading 30+ files | 🧪 In Testing | 📄 Case Study · Config: tools/memory/gpt5mini_gemini_codebase_analysis_memory.yaml |
| Revert Feature After Final Agent Failure | v0.1.1 | Automated rollback mechanism when final agent execution fails, ensuring safe multi-agent operations | ⏸️ Blocked | 📄 Issue #325 |
| Twitter Integration Case Study | v0.x | Multi-agent Twitter posting and engagement with MCP integration | ⏸️ Blocked | Blocked by Twitter rate limits, will revisit |
| Title | Version | Short Description | Status | Link |
|---|---|---|---|---|
| Advanced Orchestration Patterns | v0.2.0+ | Task decomposition, parallel coordination, adaptive agent assignment for complex multi-agent workflows | 📋 Planned | 📄 Case Study · ROADMAP.md |
| Visual Workflow Designer | v0.2.0+ | No-code multi-agent workflow creation with drag-and-drop interface for building complex agent interactions | 📋 Planned | 📄 Case Study · ROADMAP.md |
| Enterprise Features | v0.2.0+ | RBAC, audit logs, compliance tracking, multi-user collaboration for enterprise deployments | 📋 Planned | 📄 Case Study · ROADMAP.md |
| Framework Integrations | v0.2.0+ | Seamless integration with LangChain, CrewAI, and custom framework adapters for ecosystem compatibility | 📋 Planned | 📄 Case Study · ROADMAP.md |
| Complete Multimodal Pipeline | v0.2.0+ | End-to-end audio and video understanding with generation capabilities for full multimodal workflows | 📋 Planned | 📄 Case Study · ROADMAP.md |
| Multi-Agent Marketing Automation | Future | Parallel analysis and engagement: Find 200 Twitter accounts (VCs, customers), analyze historical data per account, automated replies to followers. Competitor activity analysis across Twitter, Discord, GitHub with key datapoint extraction. One agent per data point for parallel processing | 📋 Planned | 📄 Case Study |
| Web Agent Browsing | Future | Agents autonomously browse and interact with web applications using Gemini 2.5 Computer Use and OpenAI Operator for complex web tasks | 📋 Planned | 📄 Case Study · Target: Mind2Web Leaderboard |
| Map-Reduce Document Processing | Future | Assign one document per agent to process, vote on which documents require manager attention. Applications: meeting notes prioritization, paper selection, email triage | 📋 Planned | 📄 Case Study |
| Website Creation from Scratch | Future | Produce high-quality website better than existing tools (e.g., Manus.im) with multi-agent collaboration | 📋 Planned | 📄 Case Study |
| MassGen Video Recording and Editing | Future | Auto-generate case study videos: run command, record, edit (speed up, captions, log highlights), produce 1-min demo videos automatically | 📋 Planned | 📄 Case Study |
| Paper Reviewing | Future | Provide detailed academic paper feedback competing with tools like Refine.ink | 📋 Planned | 📄 Case Study |
| Priority-Based Document Ranking | Future | Vote on document importance for busy managers/researchers: meeting notes, conference papers, stock news, emails | 📋 Planned | 📄 Case Study |
For contributors who want to create their own case studies:
| Title | Description | Status | Link |
|---|---|---|---|
| Case Study Template | Comprehensive template with PLANNING → TESTING → EVALUATION structure, including baseline analysis, success criteria, and status tracking | 📝 Template | 📄 Template |
- Total Case Studies: 47
- ✅ Ready: 22 (completed with documentation)
- ✅ Completed: 1 (Web UI Development)
- 📝 Planning: 1 (Interactive Course Generator)
- 🧪 In Testing: 1 (Codebase Analysis)
- 🔄 In Progress: 2 (Agent Adapter, Irreversible Actions Safety)
- ⏸️ Blocked: 2 (Revert Feature, Twitter Integration)
- 📋 Planned: 17 (future backlog: 3 upcoming releases + 5 v0.2.0+ features + 7 existing planned + 2 in development tracks)
- 📝 Template: 1
- With Video Demonstrations: 9
- Release Versions Covered: v0.0.3 to v0.1.15 (planned)
- Categories: 6 (Release Features, Research, Travel, Creative, In Development, Planned)
- ✅ Session state tracking and restoration (v0.1.9)
- ✅ Computer use automation (browser & desktop control) (v0.1.9)
- ✅ Fuzzy model name matching and discovery (v0.1.9)
- ✅ Automation mode for meta-analysis (v0.1.8)
- ✅ DSPy question paraphrasing for diversity (v0.1.8)
- ✅ Agent task planning with dependencies (v0.1.7)
- ✅ Background shell execution (v0.1.7)
- ✅ Preemption coordination (v0.1.7)
- ✅ Multi-turn conversations with persistent memory
- ✅ Multimodal understanding (images, audio, video, PDFs)
- ✅ Custom Python tools integration
- ✅ MCP (Model Context Protocol) integration
- ✅ External framework adapters (AG2)
- ✅ Universal code execution across all backends
- ✅ Planning mode for safe tool coordination
- ✅ Advanced filesystem operations with permissions
- ✅ Cross-workspace coordination
- ✅ Semantic fact extraction and retrieval
- ✅ Multi-agent consensus building
- ✅ Voting and refinement cycles
- ✅ Representative agent selection
- ✅ Diverse reasoning capacities
- ✅ Cross-verification and validation
- ✅ Iterative improvement workflows
- ✅ Research and analysis
- ✅ Code development and testing
- ✅ Content creation and documentation
- ✅ Market analysis and feature prioritization
- ✅ Self-evolution and meta-analysis (v0.1.8)
- ✅ Multi-turn conversation continuation (v0.1.9)
- ✅ Browser and desktop automation (v0.1.9)
- ✅ External system integration (Discord, Notion, GitHub)
- 🔄 Web UI design and competitive development
- 🔄 Educational content generation (interactive courses)
- 📋 Web browsing and automation
- 📋 Document processing at scale
- 📋 Video creation and editing
- 📋 Academic paper reviewing
-
Issues First: Submit a GitHub issue for case studies, add label
case study -
Planning Phase:
- Write prompt
- Define config file (agents, hyperparameters)
- Specify command
- Document current vs. expected behavior
- Explain how it ties to self-improvement goals
-
Development Phase:
- Feature created by dev team
- Iterate with development team
- Test early to address issues revealed by case study
-
Evaluation Phase:
- Improve the prompt (ideally in same domain as planning)
- Run prompt with provided cmd and config
- Record video demonstration
- Write report using Claude Code
- Push to GitHub repo by release
Goal: Have case study planned the day after releasing previous feature
- Release n-1 feature: Morning
- Plan case study for feature n: Night sync (Tuesday, Thursday, Monday)
We want this to be a community document:
- Submit your own case studies
- Share interesting use cases
- Contribute to backlog of goals
-
Browse by Category: Find case studies relevant to your use case
- Release Features: Production-ready capabilities and upcoming releases (17 total: 14 ready + 3 planned)
- In Development: Active development with PRs/testing (7 case studies)
- Planned: Future roadmap items (12 in backlog)
-
Check Status Icons:
- ✅ Ready: Complete with documentation and session logs
- ✅ Completed: Executed successfully, documentation in progress
- 📝 Planning: Design complete, ready for execution
- 🧪 In Testing: Active testing and iteration
- ⏸️ Blocked: Waiting on external dependencies
- 📋 Planned: In backlog, not yet started
-
Watch Videos: Click video links (🎥) to see live demonstrations
-
Read Documentation: Click "📄 Case Study" links for detailed technical documentation
-
Track Progress: Use GitHub issues and PRs to follow development
-
Create Your Own: Use the template to contribute your own case studies
- Session management with restoration (v0.1.9) ✅
- Computer use automation tools (v0.1.9) ✅
- Fuzzy model matching (v0.1.9) ✅
- Automation mode for LLM agents (v0.1.8) ✅
- DSPy question paraphrasing (v0.1.8) ✅
- Agent task planning (v0.1.7) ✅
- Background shell execution (v0.1.7) ✅
- Multi-turn conversations with persistent memory ✅
- Multimodal understanding (images, audio, video, PDFs) ✅
- Custom Python tools integration ✅
- MCP integration ✅
- Universal code execution ✅
- Web UI competitive development ✅
- Sequential agent pipelines (course generation) 📝
- Large codebase analysis with memory 🧪
- Automatic MCP tool selection (v0.1.13) 📋
- Terminal evaluation and automated case studies (v0.1.14) 📋
- Parallel file operations and Docker isolation (v0.1.15) 📋
- Advanced video understanding and editing 📋
- Map-reduce document processing 📋
- User-defined voting criteria 📋
- Scalable multi-agent backends 📋
- Advanced orchestration patterns (v0.2.0+) 📋
- Visual workflow designer (v0.2.0+) 📋
- Enterprise features (v0.2.0+) �
Website Creation from Scratch
- Goal: Produce websites better than existing tools (e.g., Manus.im)
- Requirements: Playwright/Computer Use, image understanding, filesystem memory, multi-turn memory
- Timeline: After Computer Use integration complete
MassGen Video Recording and Editing
- Goal: Auto-generate case study videos (1-min demos with editing)
- Requirements: Video understanding, cmd execution, video editing MCP, planning mode
- Timeline: Could start soon, needs efficient handling of >10min videos
Paper Reviewing
- Goal: Compete with tools like Refine.ink for detailed paper feedback
- Requirements: Multi-agent review, quality assessment, research knowledge
- Timeline: Research-focused development phase
Interactive Course Generation
- Goal: Transform PDFs/textbooks into interactive courses automatically
- Requirements: 5-agent pipeline, component generation, quality review
- Status: Planning complete, ready for execution
We welcome community contributions! To create your own case study:
-
Submit GitHub Issue: Use the
case studylabel -
Planning Phase:
- Define prompt and expected behavior
- Create config file
- Explain connection to self-improvement goals
-
Run MassGen: Save session logs and outputs
-
Record Demo: Use OBS Studio or similar tools
-
Write Documentation: Follow the case study template
-
Submit PR: Include case study doc and video (Claude Code can help)
See the Contributing Guidelines for submission instructions.
- Make it easier to share MassGen sessions online
- Auto-logging for users who opt in (currently just contributors)
- Better community organization for user-submitted case studies
- Automated case study video generation pipeline (see v0.1.14 in Release Features)
- Case study quality metrics and benchmarking
For detailed development roadmap and upcoming features, see ROADMAP.md.
- Documentation: https://docs.massgen.ai
- Case Studies Online: https://docs.massgen.ai/en/latest/examples/case_studies.html
- GitHub Repository: https://github.com/massgen/MassGen
- Website: https://massgen.ai
- Case Studies Portal: https://case.massgen.ai
This summary covers case studies from MassGen v0.0.3 (initial release) through v0.1.12 (latest), with planned releases through v0.1.15 and long-term vision for v0.2.0+. For detailed development roadmap, see ROADMAP.md.
v0.1.12 (November 14, 2025) - System Prompt Architecture & Multi-Agent Computer Use
- Complete system prompt refactoring with hierarchical structure
- Semtools (semantic search) and Serena (LSP symbol understanding) skills
- Enhanced Gemini computer use with Docker integration
- Multi-agent computer use coordination with VNC visualization
v0.1.11 (November 12, 2025) - Rate Limiting & Bug Fixes
- Multi-dimensional rate limiting for Gemini models
- Model-specific limits with sliding window tracking
- Various bug fixes and stability improvements
v0.1.10 (November 10, 2025) - Framework Streaming & Handbook
- Real-time streaming for LangGraph and SmoLAgent
- MassGen Handbook at https://massgen.github.io/Handbook/
- Enhanced debugging for external framework tools
v0.1.9 (November 7, 2025) - Session Management & Computer Use Tools
- Complete session state tracking and restoration for multi-turn conversations
- Computer use automation tools (Claude, Gemini, OpenAI) for browser and desktop control
- Enhanced config builder with fuzzy model matching
- Expanded backend support (Cerebras, Together, Fireworks, Groq, OpenRouter, Moonshot)
v0.1.8 (November 5, 2025) - Automation Mode & DSPy Integration
- Automation infrastructure with
--automationflag for LLM agents - Real-time
status.jsonmonitoring for programmatic workflows - DSPy-powered question paraphrasing for multi-agent diversity
- Meta-coordination capabilities (MassGen running MassGen)
v0.1.7 (November 3, 2025) - Agent Task Planning & Background Execution
- MCP-based task management with dependency tracking
- Background shell execution for long-running commands
- Preemption coordination for multi-agent workflows
v0.1.6 (November 1, 2025) - Additional improvements and bug fixes
- Status: ✅ Completed October 25, 2025
- Duration: 12 minutes, 25 seconds
- Session:
log_20251025_222849 - Agents: GPT-5 (architecture), Claude 3.5 Haiku (technical), Gemini 2.5 Flash (visual)
- Outcome: Gemini won with unanimous 3/3 votes, 97KB production-ready code
- Key Achievement: Zero bugs, zero post-processing, professional dashboard in <15 minutes
- Draft PR: massgen/MassGen#375
- Status: 📝 Planning Complete, Ready for Execution
- Pipeline: 5-agent sequential (parser → structurer → builder → reviewer → publisher)
- Expected Duration: ~30 minutes from PDF to course
- Components: Q&A, drag-and-match, flowcharts, info bubbles, code examples
- Quick Reference:
docs/case_studies/interactive-course-generator-QUICKREF.md
- Status: 🧪 In Testing
- Target: FastAPI repository analysis
- Goal: Read 30+ files, create comprehensive architecture documentation
- Config:
tools/memory/gpt5mini_gemini_codebase_analysis_memory.yaml - Command: Clone FastAPI, analyze components, interactions, design patterns, request flows
- Status: ⏸️ Blocked
- Issue: massgen/MassGen#325
- Goal: Automated rollback when final agent execution fails
- Use Case: Safe multi-agent operations with failure recovery
- Status: ⏸️ Blocked by Twitter API rate limits
- Goal: Multi-agent Twitter posting and engagement
- Will Revisit: After API access issues resolved
- Status: 📋 Planned
- Goal: Scalable social media marketing and competitor intelligence
- Key Features:
- Account Discovery: Find 200+ target Twitter accounts (VCs, potential customers)
- Parallel Analysis: One agent per account for historical data analysis
- Automated Engagement: Intelligent automated replies to followers
- Competitor Monitoring: Track competitor activity across Twitter, Discord, GitHub
- Key Datapoint Extraction: Each agent focuses on specific metrics
- Parallel Processing: Simultaneous analysis across all platforms
- Technical Requirements:
- Twitter MCP for API access
- Discord MCP for server monitoring
- GitHub MCP for repository analysis
- Parallel agent execution framework
- Data aggregation and reporting tools
- Rate limiting and API quota management
- Use Cases:
- VC outreach campaigns
- Customer engagement at scale
- Competitive intelligence gathering
- Market sentiment analysis
- Lead generation and qualification
- Architecture: Map-reduce pattern with one agent per data point, coordinator agent for aggregation
- Status: 📋 Planned
- Goal: Compete on Online Mind2Web Leaderboard
- Requirements: Gemini 2.5 Computer Use, OpenAI Operator support
- Use Case: Autonomous web navigation and interaction
- Target: https://huggingface.co/spaces/osunlp/Online_Mind2Web_Leaderboard
- Status: 📋 Planned
- Goal: Information-to-attention prediction for busy managers
- Applications:
- Meeting notes prioritization
- Conference paper selection (100+ papers)
- Stock news analysis for traders
- Email triage (top 10 important from past 10 days)
- Requirements:
- Separate document processing per agent
- Scalable multi-agent backend
- User-defined voting criteria
- Key Innovation: Vote on which document requires attention, not which answer is better
- Status: 📋 Planned
- Goal: Produce websites better than Manus.im
- Example: https://manus.im/share/lFvPbN1H9dtI1zXIbuYAiF?replay=1
- Technical Requirements:
- Playwright/Computer Use for browser automation
- Image understanding (v0.1.3) for visual design
- Filesystem memory to reduce re-reading
- Multi-turn memory for follow-up queries
- OpenAI Operator and Gemini 2.5 Computer Use
- Status: 📋 Planned (could start soon)
- Goal: Auto-generate 1-min case study videos
- Workflow:
- Run command with full args
- Record entire session
- Edit: speed up, add captions, highlight logs
- Cut unnecessary parts
- Produce final demo video
- Technical Requirements:
- Video understanding (v0.1.3+)
- Command execution (v0.0.31)
- Video editing MCP (need to find existing tool)
- Challenge: Handle >10min command execution + video watching/editing time
- Approach: May need planning mode, coordination power of MassGen
- Status: 📋 Planned
- Goal: Compete with Refine.ink for detailed paper feedback
- Target: https://www.refine.ink/
- Requirements: Multi-agent review, academic quality assessment
- Use Case: Provide constructive, thorough academic paper reviews
Last Updated: November 2, 2025