New Pipeline: Talking Head — Turn raw footage into polished social videos #2

calesthio · 2026-04-01T17:01:49Z

calesthio
Apr 1, 2026
Maintainer

Talking Head Pipeline is here

We just shipped the talking-head pipeline — a complete system that takes your raw talking-head footage (webcam, phone, camera) and transforms it into a polished, social-ready video with animated captions, enhancements, background music, and multi-clip assembly.

Think: record yourself talking for 3 minutes, paste a prompt, and get back a finished Instagram Reel / TikTok / YouTube Short — with jump cuts, eye enhancement, face-tracked reframing, word-by-word animated captions, and background music that fades in only during your speech sections.

What can you do with it?

Single talking-head video:

Take my raw footage, remove the dead air, enhance my face and eyes, add animated captions, and export for Instagram Reels.

Multi-clip showcase reel:

Build a reel from my intro clip, 4 project demos, and an outro. Add letterboxed showcase cards with titles for each demo, crossfade transitions, and background music only during the talking parts.

Podcast clip extraction:

Take this 45-minute recording, cut the silence, speed it up 1.25x, add captions with word highlighting, and export vertical clips for Shorts.

8 New Tools Built

Tool	What it does
silence_cutter	Detects silence via FFmpeg and removes it (or speeds it up). Configurable threshold, padding to avoid clipped words. Modes: `remove`, `speed_up`, `mark`.
face_tracker	MediaPipe/OpenCV face detection — per-frame bounding boxes for smart reframing decisions. Falls back to Haar cascade.
auto_reframe	Face-tracked smart crop for aspect ratio conversion (16:9 → 9:16). Smooth camera panning that keeps the speaker centered. Presets: portrait, square, landscape, cinematic.
eye_enhance	Under-eye dark circle removal + eye brightening via MediaPipe Face Mesh landmarks. Subtle enhancement that makes a real difference on webcam footage.
remotion_caption_burn	Animated word-by-word captions rendered via Remotion. Active word gets highlighted (like TikTok/Instagram captions). Falls back to FFmpeg subtitle burn. Supports ASR correction dict.
showcase_card	Creates letterboxed 9:16 presentation cards from source videos — bold title at top, subtitle at bottom, dark background padding. For reel showcase segments.
visual_qa	Automated quality checks — extracts frames at timestamps for visual review, validates video probe data, checks audio levels. The agent inspects its own output before presenting to you.
segmented_music (audio_mixer)	New operation on audio_mixer — plays background music only during specified time segments with smooth fades at boundaries. Music during speech, silence during demos.

Plus updates to subtitle_gen with a corrections dictionary for fixing common ASR mistakes (e.g. "cloud" → "Claude").

The Enhancement Chain

The pipeline runs enhancements in this order:

raw footage
  → silence_cutter (remove dead air, ~20-30% reduction)
  → face_enhance (sharpen faces)
  → eye_enhance (dark circles + brightening)
  → color_grade (apply color profile)
  → speed adjustment (1.25x for tighter pacing)
  → auto_reframe (16:9 → 9:16 with face tracking)
  → remotion_caption_burn (animated word-by-word captions)
  → showcase_card (letterboxed cards for demo clips)
  → video_stitch (crossfade/fadeblack transitions)
  → segmented_music (music during speech only)
  → visual_qa (self-review before delivery)

Every step is optional — the agent checks what tools are available and adapts.

Try It — Sample Prompt

Drop your raw footage into a project folder and use this prompt:

I have raw talking-head footage at projects/my-video/assets/raw.mp4.
Make it into a polished Instagram Reel:
- Remove all dead air and long pauses
- Enhance my face and eyes (keep it subtle)
- Speed up to 1.25x for tighter pacing
- Reframe to 9:16 portrait, keep my face centered
- Add animated word-by-word captions at the bottom
- Find upbeat royalty-free background music and fade it in during my speech sections
- Export at 1080x1920 for Instagram Reels

Or for a multi-clip reel:

I have an intro clip (intro.mp4), 3 project demo videos (demo1.mp4, demo2.mp4, demo3.mp4),
and an outro clip (outro.mp4) in projects/showcase/assets/.

Build a showcase reel for Instagram:
- Process intro and outro: silence cut, face enhance, captions
- Create letterboxed showcase cards for each demo with titles
- Assemble with crossfade transitions between talking and demos
- Add background music only during the talking head sections
- Export as one continuous 9:16 video

What you need

Zero API keys — face enhancement, silence cutting, reframing, and FFmpeg captions all work locally
Optional: FAL_KEY or OPENAI_API_KEY — unlocks AI-generated B-roll, premium TTS narration
FFmpeg installed (used by most tools under the hood)
Node.js (for Remotion animated captions — falls back to FFmpeg if unavailable)

We'd love to see what you make with it. Share your results in Show and tell!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Pipeline: Talking Head — Turn raw footage into polished social videos #2

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

New Pipeline: Talking Head — Turn raw footage into polished social videos #2

Uh oh!

calesthio Apr 1, 2026 Maintainer

Talking Head Pipeline is here

What can you do with it?

8 New Tools Built

The Enhancement Chain

Try It — Sample Prompt

What you need

Replies: 0 comments

calesthio
Apr 1, 2026
Maintainer