The automatic movie assembler with generative AI tendencies.
This project allows you to programmatically create and manage projects that consist of multiple movies. Each movie within a project can contain several scenes, and each scene can contain multiple snippets. This documentation outlines how to structure your projects and movies to use with the API effectively.
- Movies: the final output video is called a movie, it is composed of...
- Scenes: an arrangement of...
- Scenes are built by describing one of the following arrangements and telling the program which snippet to take audio from. You could specify multiple audios but that would probably be annoying
- stacking: placing a background then stacking some smaller snippets or greenscreen snippets on top of each other in layers
- vertical/horizontal: to get a presenter type video or side by side type video
- picture in picture: like the news or putting a streamer in the bottom corner
- montage: plays a bunch of videos in a row
- Scenes are built by describing one of the following arrangements and telling the program which snippet to take audio from. You could specify multiple audios but that would probably be annoying
- Snippets: image, audio, or video assets to add to a scene.
- You tell each snippet where in the video it lives in relation to the final size (left bottom for example) and some left/down offset for fine-tuning. You'll also tell it if it has a greenscreen to mask away for stacking
- You also tell each snippet where to find the asset, or to create one using one of the machine learning extension
- audio:
.tiktokfor TikTok TTS or.xi_labsfor ElevenLabs or.whisperfor OpenAI Whisper - video: [Coming Soon]
.sorafor OpenAI Sora or.didfor D-ID talking head type videos or.randto have OpenAI create some search terms based on a prompt and you'll get to choose from a selection of stock videos to merge into a background video. - image:
.giphyfor selected Giphy gifs, [Coming Soon].sdfor Stable Diffusion.defor DALL-E and.mjfor Midjourney
- audio:
- In vertical/horizontal scenes, the snippets are arranged in order from top down or left to right.
- title (string): The title of the movie.
- scenes (array of Scene objects): A collection of scenes that make up the movie.
- has_subtitles (boolean, optional): Indicates whether the movie includes subtitles. Defaults to false if not specified.
- final_size (tuple of integers, optional): The final resolution of the movie, specified as [width, height]. If not provided, a default size may be applied.
- duration (integer, optional): The total duration of the movie in seconds. If not provided, it might be calculated based on the content.
- snippets (array of Snippet objects): A collection of snippets that are part of the scene.
- arrangement (string): Defines how snippets are arranged within the scene. Must be one of
"stack": places a background image or video then stacks additional snippets or greenscreen snippets on top of each other in layers toward the viewer"vertical": places snippets starting at the top in a top-to-bottom arrangement"horizontal": places snippets starting at the left in a left-to-right arrangement"montage": cuts one or more snippets together to be played sequentially
- use_audio (array of integers): Specifies which snippets' audio tracks should be used in the scene. The integers represent the index of the snippets within the snippets array.
A snippet is a grouping of one of the following types: audio video or caption
{
"audio": {
"asset": "write a funny script about kitty cats doing orange cat activities.whisper",
"voice": "echo"
}
}{
"audio": {
"asset": "./assets/music/son_of_preacher_man.mp3"
}
},
{
"video": {
"asset": "./assets/videos/vincent.mp4",
"size": [1080, 1000],
"has_greenscreen": true,
"anchor": ["center", "bottom"]
}
}{
"audio": {
"asset": "write a funny script about kitty cats doing orange cat activities.whisper",
"voice": "echo"
},
"video": {
"asset": "./assets/videos/compilation.mp4",
"size": [1080, 1920]
}
}-
video:
- asset (string): The identifier or path to the media asset (e.g.
path/to/video.mp4). You can use one of the following generative extensions. Generative extensions are arranged assome prompt for the ai.model- [Coming Soon]
.sorafor OpenAI Sora or.didfor D-ID talking head type video.
- [Coming Soon]
- duration (integer, optional): The duration of the snippet in seconds.
Requiredfor image snippets where there's no audio or other snippet to determine how long a scene should be- Duration cannot exceed provided asset's duration. If you specify a 10s duration on a 5s video the video will halt after 5s
- has_greenscreen (boolean, optional): Indicates if the snippet features a green screen that should be keyed out.
- size (tuple of integers, optional): The resolution of the snippet, specified as
[width, height].- In situations like
arrangement: "horizontalsome defaults are assigned so snippets scale properly.
- In situations like
- location (tuple of integers): The on-screen location of the snippet, specified as [left from start, down from start].
- This is relative to the
anchorand describes how many pixels left and below the top-left pixel of the snippet
- This is relative to the
- anchor (tuple of strings): The anchor point for the snippet's position, specified as [
("left" | "center" | "right"),("top" | "center" | "bottom")]. Defaults["left", "top"]
- asset (string): The identifier or path to the media asset (e.g.
-
audio:
- asset (string): The identifier or path to the media asset (e.g.
path/to/audio.mp3). You can use one of the following generative extensions. Generative extensions are arranged assome prompt for the ai.model- audio:
.tiktokfor TikTok TTS or.xi_labsfor ElevenLabs or.whisperfor OpenAI Whisper- [Coming Soon]
.mufor Mubert AI-generated music
- [Coming Soon]
- script (string, optional): A script or text to be used with the snippet, if applicable. Also accepts the path to
.txtfile. If going for a simple TTS without a generative script put your pre-written script here - voice (string, optional): The voice identifier for text-to-speech synthesis, if applicable.
- See
voices.pyfor a list of useable voices (or ElevenLabs API for a list of those voices) - duration (integer, optional): The duration of the snippet in seconds.
- Duration cannot exceed provided asset's duration. If you specify a 10s duration on a 5s video the video will halt after 5s
- See
- audio:
- asset (string): The identifier or path to the media asset (e.g.
-
caption:
- asset (string): The text to display in the caption
- has_background (boolean, optional): Specifies if the snippet includes a background.
- Used to give a semi-transparent background to make text more readable
- size (tuple of integers, optional): The resolution of the snippet, specified as
[width, height].- In situations like
arrangement: "horizontalsome defaults are assigned so snippets scale properly.
- In situations like
- location (tuple of integers): The on-screen location of the snippet, specified as [left from start, down from start].
- This is relative to the
anchorand describes how many pixels left and below the top-left pixel of the snippet
- This is relative to the
- anchor (tuple of strings): The anchor point for the snippet's position, specified as [
("left" | "center" | "right"),("top" | "center" | "bottom")]. Defaults["left", "top"] - font (string optional): Path to font file or one of MoviePy's default fonts
- color (string optional default white): text color
- has_background (boolean optional default false): determines whether to assign translucent background to text to make it more visible without a full background
- method (string optional default caption): See MoviePy TextClip documentation
- align (string optional default center): Text Alignment (See MoviePy TextClip documentation)
- fontsize (integer optional default 70): Font size
- stroke_width (integer optional default 3): Stroke width
- Theres no background music volume setting
- Theres no scene transition, its just a jump cut
- The types of videos this can make are very limited as custom animation or motions just don't exist right now
- No Brady Bunch or PIP arrangement (PIP is on the way) or side by side arrangements of stacked scenes (but you can make this by combinining 2 or 3 runs of simpler videos)
- Actually for brady bunch, you'd do one run of 3 movies vertically or horizontally arranged w/ 3 snippets and do a second run simply doing whatever the opposite arrangement was (so if you made 3 horizontally arranged 3 snippet movies, just vertically arrange them in a final movie)
- 2 runs of the tool because you need to write the first 3 movies to disk before proceeding. The design philosophy here is simplicity, flexibility, and extendability and making every type of video arrangement would be a Herculean effort which is why we only provide simple formats that can be combined to more complex videos.
- Actually for brady bunch, you'd do one run of 3 movies vertically or horizontally arranged w/ 3 snippets and do a second run simply doing whatever the opposite arrangement was (so if you made 3 horizontally arranged 3 snippet movies, just vertically arrange them in a final movie)
- explainer with a talking head presenter
- narrated/musical montages
- masked foreground on background (memes, presenters, etc)
- [Coming Soon] picture in picture (news casts, react videos, explainers)
To create one
python3 -m venv .venv
To activate it and get a nice clean package context
source .venv/bin/activate
To install packages for this project
pip3 install -r requirements.txt
To add packages to this project (You should be in the virtual environment or you're going to have a bloated requirements.txt)
pip3 install <package>
pip3 freeze > requirements.txt
To get back to your home package context
deactivate
Make sure you have all the keys exported to the environment from where you run this. You'll need the following
OPENAI_API_KEY=
ASSEMBLY_AI_API_KEY=
PEXELS_API_KEY=
ELEVEN_API_KEY=
GIPHY_API_KEY=OpenAI - generating scripts or using their TTS whisper
AssemblyAI - transcribing spoken word in a video and creating subtitle .srt files
Pexels - getting compilations of stock videos to use in a background
Elevenlabs - better TTS than TikTok
Giphy - source for gifs and stickers
right now its just keyed to read a manifest.json file. You can use the json below to make your own and see it in action.
run your handy python main.py from the /app directory (this one) and video generator go brrrrr
[
{
"title": "audio_only",
"scenes": [
{
"arrangement": "stack",
"use_audio": [0],
"snippets": [
{
"audio": {
"asset": "write a funny script about kitty cats doing orange cat activities.whisper",
"voice": "echo"
}
}
]
}
]
},
{
"title": "including audio track",
"final_size": [1080,1920],
"scenes": [
{
"arrangement": "stack",
"use_audio": [0],
"snippets": [
{
"video": {
"asset": "./assets/background/code.jpg",
"size": [1080,1920]
},
"audio": {
"asset": "./assets/music/elevator.mp3",
"duration": 7
}
},
{
"video": {
"asset": "./assets/videos/vincent.mp4",
"size": [1080, 1000],
"has_greenscreen": true,
"anchor": ["center", "bottom"]
}
},
{
"caption": {
"asset": "looking for the api documentation but it keeps sending you to the marketing site",
"anchor": ["center", "top"],
"location": [0, 400]
}
}
]
}
]
},
{
"title": "adding 'background music'",
"final_size": [1080,1920],
"scenes": [
{
"arrangement": "stack",
"use_audio": [0,1],
"snippets": [
{
"video": {
"asset": "./assets/background/bed.jpg",
"size": [1080,1920]
},
"audio": {
"asset": "./assets/music/elevator.mp3"
}
},
{
"video": {
"asset": "./assets/videos/cat-snore.mp4",
"size": [1080, 1400],
"has_greenscreen": true,
"anchor": ["center", "bottom"]
}
},
{
"caption": {
"asset": "sleepping through a pager dookie because I'm dropping my notice tomorrow",
"anchor": ["center", "top"],
"location": [0, 400],
"font": "./fonts/montserrat_bold.ttf",
"color": "white",
"has_background": true,
"align": "center",
"fontsize": 50,
"stroke_width": 5
}
}
]
}
]
},
{
"title": "narrated with generated script and caption",
"final_size": [1080, 1920],
"has_subtitles": true,
"scenes": [
{
"arrangement": "stack",
"use_audio": [0],
"snippets": [
{
"video": {
"asset": "orange cat kitty cat.rand",
"size": [1080, 1920]
},
"audio": {
"asset": "write a funny script about kitty cats doing orange cat activities.tiktok",
"voice": "en_us_006"
}
},
{
"caption": {
"asset": "orange cat story time",
"anchor": ["center", "top"],
"location": [0, 400]
}
}
]
}
]
},
{
"title": "narrated with pre-made script",
"final_size": [1080, 1920],
"has_subtitles": true,
"scenes": [
{
"arrangement": "stack",
"use_audio": [0],
"snippets": [
{
"video": {
"asset": "office pizza rainforrest friends clouds jungle.rand",
"size": [1080, 1920]
},
"audio": {
"asset": ".tiktok",
"script": "./assets/scripts/meeting.txt",
"voice": "en_male_funny"
}
}
]
}
]
},
{
"title": "vertically arranged snippets",
"scenes": [
{
"arrangement": "vertical",
"use_audio": [1],
"snippets": [
{
"video": {
"asset": "./assets/videos/cat-drama.mp4",
"size": [1080, 640]
}
},
{
"video": {
"asset": "./assets/videos/toothless.mp4",
"size": [1080, 640]
},
"audio": {
"asset": "./assets/music/elevator.mp3",
"duration": 7
}
},
{
"video": {
"asset": "./assets/videos/stare.mp4",
"size": [1080, 640]
}
}
]
}
]
},
{
"title": "multi scene",
"final_size": [1080, 1920],
"scenes": [
{
"arrangement": "stack",
"use_audio": [1],
"snippets": [
{
"video": {
"asset": "./assets/background/code.jpg",
"size": [1080, 1920]
}
},
{
"video": {
"asset": "./assets/videos/toothless.mp4",
"duration": 7,
"size": [1080, 1500],
"has_greenscreen": true,
"anchor": ["center", "bottom"]
}
},
{
"caption": {
"asset": "when the pipeline works",
"anchor": ["center", "top"],
"location": [0, 400]
}
}
]
},
{
"arrangement": "stack",
"use_audio": [1],
"snippets": [
{
"video": {
"asset": "./assets/background/office.jpg",
"size": [1080, 1920]
}
},
{
"video": {
"asset": "./assets/videos/stare.mp4",
"duration": 7,
"size": [1080, 1100],
"has_greenscreen": true,
"anchor": ["center", "bottom"]
}
},
{
"caption": {
"asset": "when it dont works",
"anchor": ["center", "top"],
"location": [0, 400]
}
}
]
}
]
},
{
"title": "side by side same snippet",
"final_size": [1920, 1080],
"scenes": [
{
"arrangement": "stack",
"use_audio": [1],
"snippets": [
{
"video": {
"asset": "./assets/background/code.jpg",
"size": [1920, 1080]
}
},
{
"video": {
"asset": "./assets/videos/toothless.mp4",
"size": [960, 540],
"has_greenscreen": true,
"anchor": ["left", "bottom"]
}
},
{
"video": {
"asset": "./assets/videos/toothless.mp4",
"size": [960, 540],
"has_greenscreen": true,
"anchor": ["right", "bottom"]
}
}
]
}
]
}
]