Process video URLs with AI-powered transcription, video description, and translation. Accepts single or bulk video URLs for batch processing.
- Audio Transcription - Extract speech from videos using RunPod's Faster Whisper (large-v3 model)
- Video Description - Generate visual descriptions using Novita.ai's Qwen3-VL vision models
- Translation - Translate transcriptions using:
- Whisper's built-in translation (fast, to English only)
- LLM-based translation (flexible, any language)
- Both methods for comparison
- Bulk Processing - Process single videos or arrays of video URLs
- Flexible Configuration - All AI features are optional and independently configurable
- videoUrls (string | array) - Single video URL or array of URLs
- Must be publicly accessible
- Supported formats: MP4, WebM, etc.
- Example:
"https://example.com/video.mp4"or["url1", "url2"]
- transcribeAudio (boolean) - Enable audio transcription (default:
false) - describeVideo (boolean) - Enable video description (default:
false) - translateTranscription (boolean) - Enable translation (default:
false)
- translationMethod (string) - Choose translation approach:
"whisper"- Use Whisper's built-in translation to English (fast, free)"llm"- Use LLM API for translation to any language (flexible, paid)"both"- Run both methods for comparison
- targetLanguage (string) - Target language for LLM translation (e.g., "Spanish", "French")
- runpodApiKey (string, secret) - Your RunPod API key (required if
transcribeAudiois enabled) - runpodEndpointId (string) - Your RunPod Faster Whisper endpoint ID
- novitaApiKey (string, secret) - Your Novita.ai API key (required if
describeVideois enabled)
- qwenModel (string) - Qwen3-VL vision model (default:
"qwen/qwen3-vl-8b-instruct")"qwen/qwen3-vl-8b-instruct"- Most affordable ($0.08/$0.50 per M tokens)"qwen/qwen3-vl-30b-a3b-instruct"- Balanced quality/cost"qwen/qwen3-vl-30b-a3b-thinking"- Shows reasoning process"qwen/qwen3-vl-235b-a22b-instruct"- Best quality"qwen/qwen3-vl-235b-a22b-thinking"- Premium reasoning
- maxTokens (integer) - Max output tokens for descriptions (default:
512, range: 100-2048) - maxVideoLength (integer) - Max video duration in seconds (default:
120, range: 5-600)- Videos longer than this will be skipped
- videoDescriptionPrompt (string) - Custom prompt for video description
Each processed video returns:
{
"videoUrl": "https://example.com/video.mp4",
"status": "success",
"transcription": "Original transcription text...",
"description": "AI-generated video description...",
"whisper_translation": "English translation from Whisper...",
"llm_translation": "Translation to target language...",
"processingTime": 45.23,
"error": null
}- videoUrl (string) - The video URL that was processed
- status (string) - Processing status:
"success"or"failed" - transcription (string) - Original audio transcription (if enabled)
- description (string) - AI-generated visual description (if enabled)
- whisper_translation (string | null) - Whisper's English translation (if enabled)
- llm_translation (string | null) - LLM translation to target language (if enabled)
- processingTime (number) - Total processing time in seconds
- error (string | null) - Error message if processing failed
{
"videoUrls": "https://example.com/video.mp4",
"transcribeAudio": true,
"runpodApiKey": "your-runpod-api-key",
"runpodEndpointId": "abc123xyz456"
}{
"videoUrls": "https://example.com/video.mp4",
"describeVideo": true,
"novitaApiKey": "your-novita-api-key",
"qwenModel": "qwen/qwen3-vl-8b-instruct",
"maxTokens": 512
}{
"videoUrls": "https://example.com/video.mp4",
"transcribeAudio": true,
"translateTranscription": true,
"translationMethod": "both",
"targetLanguage": "Spanish",
"runpodApiKey": "your-runpod-api-key",
"runpodEndpointId": "abc123xyz456",
"novitaApiKey": "your-novita-api-key"
}{
"videoUrls": [
"https://example.com/video1.mp4",
"https://example.com/video2.mp4",
"https://example.com/video3.mp4"
],
"transcribeAudio": true,
"describeVideo": true,
"translateTranscription": true,
"translationMethod": "llm",
"targetLanguage": "French",
"runpodApiKey": "your-runpod-api-key",
"runpodEndpointId": "abc123xyz456",
"novitaApiKey": "your-novita-api-key",
"qwenModel": "qwen/qwen3-vl-30b-a3b-instruct",
"maxTokens": 512,
"maxVideoLength": 120
}- Sign up at https://runpod.io
- Go to Settings > API Keys
- Create a new API key
- Deploy a Faster Whisper serverless endpoint
- Copy the endpoint ID from your endpoint dashboard
- Sign up at https://novita.ai
- Go to Key Management section
- Create a new API key
- Copy the API key
- Transcription: Costs depend on your RunPod endpoint configuration
- Video Description: Costs vary by model:
- 8B Instruct: $0.08/$0.50 per M tokens (most affordable)
- 30B Instruct: $0.20/$0.70 per M tokens
- 235B Instruct: $0.30/$1.50 per M tokens
- Translation:
- Whisper translation: Free (included in transcription)
- LLM translation: Additional API costs
Use maxVideoLength to control costs by skipping long videos.
- Videos that exceed
maxVideoLengthwill be skipped with message:"[Skipped: Video too long]" - Failed API calls return error messages in the format:
"[Error: error details]" - The actor continues processing remaining videos even if some fail
- Check the
statusfield to identify failed videos
- Processing time varies based on:
- Video duration
- Selected models
- Network speed
- API response times
- Average processing time: 30-60 seconds per video (with all features enabled)
- Bulk processing handles videos sequentially to avoid API rate limits
- Video URLs must be publicly accessible (no authentication required)
- Maximum video length: 600 seconds (10 minutes)
- Whisper translation only outputs English
- LLM translation quality depends on selected model
- Processing time increases with video duration and enabled features
For issues or questions:
- Check the Actor logs for detailed error messages
- Verify API keys are correct and have sufficient credits
- Ensure video URLs are publicly accessible
- Review the input schema for correct parameter format
- Initial release
- Audio transcription with RunPod Faster Whisper
- Video description with Novita.ai Qwen3-VL
- Translation with both Whisper and LLM methods
- Single and bulk video processing