Skip to content

calesthio/OpenMontage

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OpenMontage

OpenMontage

The first open-source, agentic video production system.

Quick Start  ·  Try These Prompts  ·  Pipelines  ·  How It Works  ·  Providers  ·  Agent Guide

License


Turn your AI coding assistant into a full video production studio. Describe what you want in plain language — your agent handles research, scripting, asset generation, editing, and final composition.

signal-from-tomorrow_final_with_music_upload_v2.mp4

"SIGNAL FROM TOMORROW" — a cinematic sci-fi trailer fully produced through OpenMontage: concept, script, scene plan, Veo-generated motion clips, soundtrack, and Remotion composition.

void-linkedin.mp4

"VOID — Neural Interface" — a product ad produced with just one API key (OpenAI). 4 AI-generated images (gpt-image-1), TTS narration, auto-sourced royalty-free music, word-level subtitles via WhisperX, and Remotion data visualizations. Total cost: $0.69. Zero manual asset work.

candyland.mp4

"Afternoon in Candyland" — a Ghibli-style anime animation. A little girl's whimsical afternoon adventure through candy gates, gumdrop rivers, and lollipop gardens. 12 FLUX-generated images with multi-image crossfade, cinematic camera motion (zoom, pan, Ken Burns), sparkle/petal/firefly particle overlays, and ambient music with auto-detected energy offset. Total cost: $0.15. No video generation, no manual editing.

mori-no-seishin.mp4

"Mori no Seishin" — a Ghibli-style anime animation of a forest spirit's journey through ancient woods. 12 FLUX-generated images with parallax crossfade, drift and pan camera motion, firefly and petal particles, cinematic vignette lighting, and ambient forest soundtrack. Total cost: $0.15. Still images brought to life through Remotion's animation engine.

deep-ocean.mp4

"Into the Abyss" — a deep ocean exploration rendered in anime style. Bioluminescent gardens, coral cathedrals, and creatures of light — 12 FLUX-generated images with sparkle and mist particle overlays, light-ray effects, smooth camera motion, and ambient oceanic soundtrack. Total cost: $0.15. Zero video generation APIs needed.

Works with Claude Code, Cursor, Copilot, Windsurf, Codex — any AI coding assistant that can read files and run code.


Quick Start

Prerequisites

  • Python 3.10+python.org
  • FFmpegbrew install ffmpeg / sudo apt install ffmpeg / ffmpeg.org
  • Node.js 18+nodejs.org
  • An AI coding assistant — Claude Code, Cursor, Copilot, Windsurf, or Codex

Install & Run

git clone https://github.com/calesthio/OpenMontage.git
cd OpenMontage
make setup

Open the project in your AI coding assistant and tell it what you want:

"Make a 60-second animated explainer about how neural networks learn"

That's it. The agent researches your topic with live web search, generates AI images, writes and narrates the script with voice direction, finds royalty-free background music automatically, burns in word-level subtitles, validates everything before rendering — then reviews its own output by extracting frames and transcribing audio to catch issues before you even see them. Every creative decision gets your approval.

No make? Run manually: pip install -r requirements.txt && cd remotion-composer && npm install && cd .. && pip install piper-tts && cp .env.example .env

Windows: If npm install fails with ERR_INVALID_ARG_TYPE, use npx --yes npm install instead.

Add API Keys (optional — more keys = more tools)

# .env — every key is optional, add what you have

# Best bang for buck — one key unlocks 5 tools:
FAL_KEY=your-key               # FLUX images + Google Veo, Kling, MiniMax video + Recraft images

# Free stock media:
PEXELS_API_KEY=your-key        # Free — stock footage and images
PIXABAY_API_KEY=your-key       # Free — stock footage and images

# Music:
SUNO_API_KEY=your-key          # Full songs, instrumentals, any genre

# Voice & images:
ELEVENLABS_API_KEY=your-key    # Premium TTS, AI music, sound effects
OPENAI_API_KEY=your-key        # OpenAI TTS, DALL-E 3 images
GOOGLE_API_KEY=your-key        # Google Imagen images, Google TTS (700+ voices)

# More video providers:
HEYGEN_API_KEY=your-key        # HeyGen — VEO, Sora, Runway, Kling via single gateway
RUNWAY_API_KEY=your-key        # Runway Gen-4 direct
Have a GPU? Unlock free local video generation
make install-gpu

# Then add to .env:
VIDEO_GEN_LOCAL_ENABLED=true
VIDEO_GEN_LOCAL_MODEL=wan2.1-1.3b  # or wan2.1-14b, hunyuan-1.5, ltx2-local, cogvideo-5b

What You Get With Zero API Keys

You don't need any API keys to make real videos. Out of the box, make setup gives you:

Capability Free Tool What It Does
Narration Piper TTS Free offline text-to-speech — real human-sounding narration
Visuals Pexels + Pixabay Free stock images and footage (API keys are free to get)
Composition Remotion Turns still images into animated video with spring physics, transitions, typography, and TikTok-style captions
Post-production FFmpeg Encoding, subtitle burn-in, audio mixing, color grading
Subtitles Built-in Auto-generated captions with word-level timing

The zero-key path: Piper narrates your script, stock images provide the visuals, and Remotion animates everything into a polished video with transitions, text overlays, and synced captions. Add API keys later to unlock AI-generated images, video clips, premium voices, and music.


Try These Prompts

Copy any of these into your AI coding assistant after setup. Each one runs a full production pipeline.

Zero keys needed

"Make a 45-second animated explainer about why the sky is blue"

"Create a 60-second video about the history of the internet, with narration and captions"

"Make a data-driven explainer about coffee consumption around the world"

With FAL_KEY (~$0.15–$1.50)

"Create a 30-second Ghibli-style animated video of a magical floating library in the clouds at golden hour"

"Make a 30-second anime-style animation of an underwater temple with bioluminescent coral and ancient ruins"

"Create an animated explainer about how CRISPR gene editing works, using AI-generated visuals"

"Make a product launch teaser for a fictional smart water bottle called AquaPulse"

Full setup (~$1–$3)

"Create a cinematic 30-second trailer for a sci-fi concept: humanity receives a warning from 1000 years in the future"

"Make a 90-second animated explainer about quantum computing for middle school students, with a fun narrator voice and custom soundtrack"

Want more? See the full Prompt Gallery for tested prompts with expected costs and output examples, or run make demo to render zero-key demo videos instantly.


Pipelines

Each pipeline is a complete production workflow, from idea to finished video.

Pipeline What It Produces Best For
Animated Explainer AI-generated explainer with research, narration, visuals, music Educational content, tutorials, topic breakdowns
Animation Motion graphics, kinetic typography, animated sequences Social media, product demos, abstract concepts
Avatar Spokesperson Avatar-driven presenter videos Corporate comms, training, announcements
Cinematic Trailer, teaser, and mood-driven edits Brand films, teasers, promotional content
Clip Factory Batch of ranked short-form clips from one long source Repurposing long content for social media
Hybrid Source footage + AI-generated support visuals Enhancing existing footage with graphics
Localization & Dub Subtitle, dub, and translate existing video Multi-language distribution
Podcast Repurpose Podcast highlights to video Podcast marketing, audiogram videos
Screen Demo Polished software screen recordings and walkthroughs Product demos, tutorials, documentation
Talking Head Footage-led speaker videos Presentations, vlogs, interviews

Every pipeline follows the same structured flow:

research -> proposal -> script -> scene_plan -> assets -> edit -> compose

Each stage has a dedicated director skill — a markdown instruction file that teaches the agent exactly how to execute that stage. The agent reads the skill, uses the tools, self-reviews, checkpoints state, and asks for human approval at creative decision points.

Web research is a first-class stage. Before writing a single word of script, the agent searches YouTube, Reddit, Hacker News, news sites, and academic sources. It gathers data points, audience questions, trending angles, and visual references — then cites everything in a structured research brief. Your videos are grounded in real, current information, not hallucinated facts.


Why OpenMontage?

Most AI video tools give you a single clip from a prompt. OpenMontage gives you an end-to-end production pipeline — the same structured process a real production team follows, automated by your AI agent.

Edit your own talking-head footage. Generate a fully animated explainer from scratch. Cut a 2-hour podcast into a dozen social clips. Translate and dub your content into 10 languages. Build a cinematic brand teaser from stock footage and AI-generated scenes. If a production team can make it, OpenMontage can orchestrate it.

  • 11 production pipelines — explainers, talking heads, screen demos, cinematic trailers, animations, podcasts, localization, and more
  • 49 production tools — spanning video generation, image creation, text-to-speech, music, audio mixing, subtitles, enhancement, and analysis
  • 400+ agent skills — production skills, pipeline directors, creative techniques, quality checklists, and deep technology knowledge packs that teach the agent how to use every tool like an expert
  • Live web research built in — before writing a single word of script, the agent runs 15-25+ web searches across YouTube, Reddit, news sites, and academic sources to ground your video in real, current data
  • Both free/local AND cloud providers — every capability supports open-source local alternatives alongside premium APIs. Use what you have.
  • No vendor lock-in — swap providers freely. The selector pattern auto-routes to whatever's available on your machine.
  • Budget governance built in — cost estimation before execution, spend caps, per-action approval thresholds. No surprise bills.

How It Works

OpenMontage uses an agent-first architecture. There is no code orchestrator. Your AI coding assistant IS the orchestrator.

You: "Make an explainer video about how black holes form"
 |
 v
Agent reads pipeline manifest (YAML) -- what stages to run, what tools to use
 |
 v
Agent reads stage director skill (Markdown) -- HOW to execute each stage
 |
 v
Agent calls Python tools -- TTS, image gen, video gen, FFmpeg, etc.
 |
 v
Agent self-reviews using reviewer skill -- quality checks
 |
 v
Agent checkpoints state (JSON) -- resumable if interrupted
 |
 v
Agent presents for your approval -- you stay in control
 |
 v
Final video output

Python provides tools and persistence. All creative decisions, orchestration logic, review criteria, and quality standards live in readable instruction files (YAML manifests + Markdown skills) that you can inspect and customize.


Architecture

OpenMontage/
├── tools/              # 48 Python tools (the agent's hands)
│   ├── video/          # 12 video gen providers + compose, stitch, trim
│   ├── audio/          # 4 TTS providers + Suno/ElevenLabs music, mixing, enhancement
│   ├── graphics/       # 8 image gen providers + diagrams, code snippets, math
│   ├── enhancement/    # Upscale, bg remove, face enhance, color grade
│   ├── analysis/       # Transcription, scene detect, frame sampling
│   ├── avatar/         # Talking head, lip sync
│   └── subtitle/       # SRT/VTT generation
│
├── pipeline_defs/      # 11 YAML pipeline manifests (the agent's playbook)
├── skills/             # 124 Markdown skill files (the agent's knowledge)
│   ├── pipelines/      # Per-pipeline stage director skills
│   ├── creative/       # Creative technique skills
│   ├── core/           # Core tool skills
│   └── meta/           # Reviewer, checkpoint protocol
│
├── schemas/            # 15 JSON Schemas (contract validation)
├── styles/             # Visual style playbooks (YAML)
├── remotion-composer/  # React/Remotion video composition engine
├── lib/                # Core infrastructure (config, checkpoints, pipeline loader)
└── tests/              # Contract tests, QA integration tests, eval harness

Three-Layer Knowledge Architecture

Layer 1: tools/ + pipeline_defs/     "What exists" — executable capabilities + orchestration
Layer 2: skills/                     "How to use it" — OpenMontage conventions and quality bars
Layer 3: .agents/skills/             "How it works" — 47 external technology knowledge packs

Each tool declares which Layer 3 skills it relies on. The agent reads Layer 1 to know what's available, Layer 2 to know how OpenMontage wants it used, and Layer 3 for deep technical knowledge when needed.


Supported Providers

Full setup guide with pricing and free tiers: docs/PROVIDERS.md

Video Generation — 12 providers
Provider Type Notes
Kling Cloud API High quality, fast
Runway Gen-4 Cloud API Cinematic quality
Google Veo 3 Cloud API Long-form, cinematic. Via fal.ai or HeyGen.
MiniMax Cloud API Cost-effective
HeyGen Cloud API Multi-model gateway
WAN 2.1 Local GPU Free, 1.3B and 14B variants
Hunyuan Local GPU Free, high quality
CogVideo Local GPU Free, 2B and 5B variants
LTX-Video Local GPU / Modal Free locally, or self-hosted cloud
Pexels Stock Free stock footage
Pixabay Stock Free stock footage
Image Generation — 8 providers
Provider Type Notes
FLUX Cloud API State-of-the-art quality
Google Imagen Cloud API Imagen 4 — high-quality, multiple aspect ratios
DALL-E 3 Cloud API OpenAI's image model
Recraft Cloud API Design-focused generation
Local Diffusion Local GPU Stable Diffusion, free
Pexels Stock Free stock images
Pixabay Stock Free stock images
ManimCE Local Mathematical animations
Text-to-Speech — 4 providers
Provider Type Notes
ElevenLabs Cloud API Premium voice quality
Google TTS Cloud API 700+ voices, 50+ languages — best for localization
OpenAI TTS Cloud API Fast, affordable
Piper Local Completely free, offline
Music, Sound & Post-Production

Music & Sound:

Provider Type Notes
Suno AI Cloud API Full song generation with vocals, lyrics, any genre. Up to 8 minutes.
ElevenLabs Music Cloud API AI music generation
ElevenLabs SFX Cloud API Sound effect generation

Post-Production (always available, always free):

Tool What It Does
FFmpeg Video composition, encoding, subtitle burn-in, audio muxing
Video Stitch Multi-clip assembly, crossfades, picture-in-picture, spatial layouts
Video Trimmer Precision cutting and extraction
Audio Mixer Multi-track mixing, ducking, fades
Audio Enhance Noise reduction, normalization
Color Grade LUT-based color grading
Subtitle Gen SRT/VTT generation from timestamps

Enhancement:

Tool What It Does
Upscale Real-ESRGAN image/video upscaling
Background Remove rembg / U2Net background removal
Face Enhance Face quality enhancement
Face Restore CodeFormer / GFPGAN face restoration

Analysis:

Tool What It Does
Transcriber WhisperX speech-to-text with word-level timestamps
Scene Detect Automatic scene boundary detection
Frame Sampler Intelligent frame extraction
Video Understand CLIP/BLIP-2 vision-language analysis

Avatar & Lip Sync:

Tool What It Does
Talking Head SadTalker / MuseTalk avatar animation
Lip Sync Wav2Lip audio-driven lip synchronization

Composition & Rendering:

Engine Type What It Does
Remotion Local (Node.js) React-based programmatic video — spring-animated image scenes, stat reveals, section titles, hero cards, TikTok-style word-by-word captions, scene transitions (fade/slide/wipe/flip), Google Fonts, and audio with fade curves. When no video generation providers are configured, the agent generates still images and Remotion turns them into fully animated video.
FFmpeg Local Core video assembly, encoding, subtitle burn, audio muxing, color grading

Style System

Style playbooks define the visual language for your productions:

Playbook Best For
Clean Professional Corporate, educational, SaaS
Flat Motion Graphics Social media, TikTok, startups
Minimalist Diagram Technical deep-dives, architecture

Playbooks control typography, color palettes, motion styles, audio profiles, and quality rules. The agent reads the playbook and applies it consistently across all generated assets.


Platform Output Profiles

Built-in render profiles for every major platform:

Profile Resolution Aspect Ratio
YouTube Landscape 1920x1080 16:9
YouTube 4K 3840x2160 16:9
YouTube Shorts 1080x1920 9:16
Instagram Reels 1080x1920 9:16
Instagram Feed 1080x1080 1:1
TikTok 1080x1920 9:16
LinkedIn 1920x1080 16:9
Cinematic 2560x1080 21:9

Budget Governance

Every paid API call goes through the cost tracker:

  • Estimate before execution — see what it will cost
  • Reserve budget — lock funds before the call
  • Reconcile after — record actual spend
  • Configurable modesobserve (track only), warn (log overruns), cap (hard limit)
  • Per-action approval — pause for confirmation above a threshold (default: $0.50)
  • Total budget cap — default $10, fully configurable

No surprise bills. The agent tells you what it will cost before it spends.


Agent Compatibility

OpenMontage works with any AI coding assistant that can read files and execute Python. Dedicated instruction files are included for:

Platform Config File
Claude Code CLAUDE.md
Cursor CURSOR.md + .cursor/rules/
GitHub Copilot COPILOT.md + .github/copilot-instructions.md
Codex CODEX.md
Windsurf .windsurfrules

All platform files point to the shared AGENT_GUIDE.md (operating guide and agent contract) and PROJECT_CONTEXT.md (architecture reference).

Coming soon: Local LLM support via Ollama and LM Studio — run the full production pipeline without any cloud LLM.


Contributing

OpenMontage is built to be extended. The two most common contributions:

Adding a New Tool

  1. Create a Python file in the appropriate tools/ subdirectory
  2. Inherit from BaseTool and implement the tool contract
  3. The registry auto-discovers it — no manual registration needed
  4. Add a skill file if the tool needs usage guidance

Adding a New Pipeline

  1. Create a YAML manifest in pipeline_defs/
  2. Create stage director skills in skills/pipelines/<your-pipeline>/
  3. Reference existing tools — or add new ones if needed

See docs/ARCHITECTURE.md for the full technical reference, docs/PROVIDERS.md for the complete provider guide (setup, pricing, free tiers), and AGENT_GUIDE.md for the agent contract.

Join the Community

We use GitHub Discussions to share work and ideas:

  • Show and Tell — Share videos you've made, prompts that worked well, or creative workflows you've discovered
  • Ideas — Suggest new pipelines, tools, style playbooks, or integrations
  • Q&A — Ask questions about setup, pipelines, or troubleshooting

Made something cool? Post it in Show and Tell — we'd love to see what you build.


Testing

# Run contract tests (no API keys needed)
make test-contracts

# Run all tests
make test

License

GNU AGPLv3


OpenMontage — Production-grade video, orchestrated by your AI assistant.

If this project looks useful to you, a star would really mean a lot — it helps others discover it too.

About

World's first open-source, agentic video production system. 11 pipelines, 49 tools, 400+ agent skills. Turn your AI coding assistant into a full video production studio.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors