Which tool should I start with?

For beginners, Runway's free tier offers the best learning experience with its editing tools. For quality-focused work, Sora 2 via ChatGPT Pro is worth the investment.

Can I use these commercially?

Yes, but check each provider's terms. Most allow commercial use on paid tiers. Disney characters via Sora have specific licensing requirements.

How long does generation take?

Typically 30 seconds to 5 minutes depending on duration and provider. Veo 3 Fast and Runway Turbo offer sub-30-second generation.

What about copyright?

Generated videos are typically owned by you under paid tiers. However, prompts referencing copyrighted content (except licensed characters like Disney via Sora) may be restricted.

Can I train custom models?

Currently, only Runway offers custom model training (Runway Academy). Others are closed systems. Open-source alternatives like Open-Sora exist for research.

Video Generation AI 2025: Sora 2 vs Veo 3 vs Runway Complete Guide | Enrico Piovano

The Video Generation Revolution

2025 has been the breakthrough year for AI video generation. What was impossible in 2023 is now routine: generating photorealistic videos from text prompts, with consistent characters, physics simulation, and even native audio.

Why video generation is fundamentally harder than image generation: Images are static—each pixel is independent of time. Video adds a temporal dimension: objects must move consistently, physics must be respected, and visual coherence must persist across hundreds of frames. A minor glitch that would be invisible in a single image becomes jarring when repeated across 30 frames per second. This is why video generation lagged image generation by 2+ years despite similar underlying architectures.

The commercial inflection point: What changed in 2025 wasn't just quality—it was consistency and controllability. Early models produced impressive clips but couldn't maintain character identity, follow physics accurately, or handle complex prompts reliably. The new generation (Sora 2, Veo 3) achieves "production quality" for many use cases: ads, social content, prototyping. The Disney deal signals that Hollywood sees these tools as complementary to human creators, not replacements.

The scale of investment tells the story: Disney's $1 billion deal with OpenAI for Sora 2 character rights, Google's Veo 3 integration into YouTube Shorts, and Runway powering Hollywood productions.

This guide covers everything you need to know about AI video generation in 2025—from consumer tools to API integration.

Quick Comparison

Tool	Max Length	Resolution	Audio	Price	Best For
Sora 2	60 seconds	1080p	No	$20-200/month	Realism, storytelling
Veo 3	8 seconds	4K	Native	$20-250/month	Audio sync, YouTube
Runway Gen-4	16 seconds	4K	Add-on	$15-95/month	Editing, filmmaking
Kling 2.0	5 minutes	1080p	Via Kling Audio	Free-$66/month	Long-form, faces
Minimax Hailuo	6 seconds	720p	No	Free	Quick experiments
Pika 2.0	15 seconds	1080p	Yes	$8-58/month	Motion effects

OpenAI Sora 2

Overview

Sora 2, released September 2025, is OpenAI's second-generation video model. It produces the most photorealistic videos with remarkable understanding of physics and object permanence.

Key capabilities:

Up to 60-second clips at 1080p
Text-to-video and image-to-video
Excellent physics simulation
Character consistency across scenes
Support for 200+ Disney characters (licensed)

Technical Details

Sora uses a diffusion transformer architecture:

Code

Text Prompt → Text Encoder → Diffusion Transformer → Video Frames
                                    ↑
                              Noise Schedule
                              (iterative denoising)

Architecture highlights:

Spacetime patches (3D video tokens)
Variable duration, resolution, aspect ratio
Trained on internet-scale video data
Recaptioning with detailed descriptions

Using Sora

Web Interface (ChatGPT)

Code

Prompt: "A serene underwater scene of a coral reef. Colorful
tropical fish swim lazily through crystal-clear water. Sunlight
filters down from the surface, creating dancing light patterns
on the sandy bottom. A sea turtle glides gracefully through the
frame from left to right."

Settings:
- Duration: 10 seconds
- Aspect ratio: 16:9
- Resolution: 1080p (Pro users)

Prompt Engineering for Sora

Why video prompts differ from image prompts: Image prompts describe a static scene—composition, lighting, style. Video prompts must also describe motion: what moves, how fast, in what direction. They need camera instructions: is the camera static, panning, tracking a subject? And they need temporal structure: does the scene evolve? The best video prompts read like mini screenplays, specifying what the viewer should experience over time.

The anatomy of an effective video prompt: Top-performing prompts have five elements: (1) scene description (what's visible), (2) camera movement (how we view it), (3) visual style (aesthetic treatment), (4) mood (emotional tone), and (5) motion description (what happens over time). Missing any element leaves the model to guess—and guesses create inconsistency.

Python

def create_sora_prompt(
    scene: str,
    camera: str = None,
    style: str = None,
    mood: str = None,
    motion: str = None
) -> str:
    """
    Structure prompts for best Sora results.

    Key elements:
    1. Scene description (what's in frame)
    2. Camera movement (if any)
    3. Visual style
    4. Mood/atmosphere
    5. Motion/action
    """
    prompt_parts = []

    # Scene is required
    prompt_parts.append(scene)

    # Add camera movement
    if camera:
        camera_terms = {
            "static": "Static camera shot",
            "pan_left": "Camera slowly pans from right to left",
            "pan_right": "Camera slowly pans from left to right",
            "zoom_in": "Camera gradually zooms in",
            "zoom_out": "Camera pulls back slowly",
            "dolly": "Camera moves forward through the scene",
            "crane": "Camera rises upward revealing the scene",
            "handheld": "Slight handheld camera movement",
            "drone": "Aerial drone shot moving forward",
            "tracking": "Camera tracks alongside the subject"
        }
        prompt_parts.append(camera_terms.get(camera, camera))

    # Add style
    if style:
        style_terms = {
            "cinematic": "Cinematic quality, film grain, dramatic lighting",
            "documentary": "Documentary style, natural lighting",
            "anime": "Anime style animation",
            "photorealistic": "Photorealistic, high detail",
            "vintage": "Vintage film look, warm colors, soft focus",
            "noir": "Film noir style, high contrast, dramatic shadows"
        }
        prompt_parts.append(style_terms.get(style, style))

    # Add mood
    if mood:
        prompt_parts.append(f"The mood is {mood}")

    # Add motion description
    if motion:
        prompt_parts.append(motion)

    return ". ".join(prompt_parts)

# Example usage
prompt = create_sora_prompt(
    scene="A lone astronaut walks across the surface of Mars, red dust swirling around their boots",
    camera="tracking",
    style="cinematic",
    mood="isolated and contemplative",
    motion="The astronaut moves slowly, deliberately, pausing to look at the horizon"
)

print(prompt)
# Output: "A lone astronaut walks across the surface of Mars, red dust
# swirling around their boots. Camera tracks alongside the subject.
# Cinematic quality, film grain, dramatic lighting. The mood is isolated
# and contemplative. The astronaut moves slowly, deliberately, pausing
# to look at the horizon."

Sora Storyboard Mode

Create connected scenes:

Python

storyboard = [
    {
        "scene": 1,
        "prompt": "A woman sits alone at a cafe table in Paris, looking pensively out the window at the rain",
        "duration": 8,
        "camera": "static"
    },
    {
        "scene": 2,
        "prompt": "Close-up of the woman's hands holding a coffee cup, rain visible through window reflection",
        "duration": 5,
        "camera": "static",
        "transition": "cut"
    },
    {
        "scene": 3,
        "prompt": "The woman stands and walks toward the cafe door, putting on her coat",
        "duration": 7,
        "camera": "tracking",
        "transition": "dissolve"
    },
    {
        "scene": 4,
        "prompt": "Wide shot of the woman stepping out into the rainy Paris street, Eiffel Tower visible in distance",
        "duration": 10,
        "camera": "crane",
        "transition": "cut"
    }
]

# Character consistency is maintained across scenes through:
# 1. Consistent character descriptions
# 2. Reference images (image-to-video for first scene)
# 3. Sora's internal character tracking

Disney Character Access

With the Disney partnership:

Code

# Licensed characters available:
- Disney Animation: Mickey, Elsa, Moana, etc.
- Pixar: Woody, Buzz, Nemo, etc.
- Marvel: Iron Man, Spider-Man, etc.
- Star Wars: Darth Vader, Yoda, etc.

# Prompt example:
"Buzz Lightyear flying through space, stars streaking past,
dramatic lighting, Pixar animation style"

Pricing

Plan	Price	Features
ChatGPT Plus	$20/month	720p, 5-sec limit, 50 videos/month
ChatGPT Pro	$200/month	1080p, 20-sec limit, unlimited

Limitations

No audio generation (must add separately)
Sometimes struggles with text in videos
Human hands and complex interactions can be glitchy
Generation time: 30 seconds to 5 minutes per clip

Google Veo 3

Overview

Veo 3, released December 2025, is Google's flagship video model. Its standout feature is native audio generation—synchronized sound effects, ambient audio, and even music.

Key capabilities:

Up to 8-second clips (extendable via stitching)
Native 4K resolution
Native audio generation (unique feature)
YouTube Shorts integration
Veo 3 Fast mode for quick iteration

Native Audio Generation

Veo 3's killer feature is synchronized audio:

Python

# Veo 3 generates both video AND matching audio

prompt = """
A thunderstorm over a mountain lake. Lightning illuminates
the peaks. Rain falls heavily on the water surface.
Thunder rumbles in the distance.
"""

# Output includes:
# - Video: The visual scene
# - Audio: Rain sounds, thunder, ambient wind
# - Perfectly synchronized to visual events

Using Veo 3

Google AI Studio

Python

import google.generativeai as genai
from google.generativeai import types

# Configure
genai.configure(api_key="YOUR_API_KEY")

# Generate video
response = genai.generate_video(
    model="veo-3",
    prompt="""
    A cozy coffee shop interior. Soft jazz plays in the background.
    Steam rises from a freshly poured latte. Rain patters against
    the window. A barista moves in the background, preparing drinks.
    """,
    config=types.GenerateVideoConfig(
        duration_seconds=8,
        aspect_ratio="16:9",
        resolution="1080p",
        generate_audio=True,
        audio_style="ambient"
    )
)

# Download the generated video
with open("coffee_shop.mp4", "wb") as f:
    f.write(response.video_bytes)

print(f"Video generated: {response.duration}s")
print(f"Audio included: {response.has_audio}")

Veo 3 Fast Mode

For quick iterations (YouTube Shorts):

Python

# Fast mode: ~10 seconds generation, lower quality
response = genai.generate_video(
    model="veo-3-fast",
    prompt="A cute puppy running through autumn leaves",
    config=types.GenerateVideoConfig(
        duration_seconds=8,
        fast_mode=True  # Enables Veo 3 Fast
    )
)

Prompt Techniques for Veo 3

Python

class Veo3PromptBuilder:
    """Builder for optimized Veo 3 prompts."""

    def __init__(self):
        self.visual_elements = []
        self.audio_elements = []
        self.camera = None
        self.style = None

    def add_visual(self, element: str):
        """Add visual element to scene."""
        self.visual_elements.append(element)
        return self

    def add_sound(self, sound: str):
        """Add sound element (Veo 3 will generate matching audio)."""
        self.audio_elements.append(sound)
        return self

    def set_camera(self, movement: str):
        """Set camera movement."""
        self.camera = movement
        return self

    def set_style(self, style: str):
        """Set visual style."""
        self.style = style
        return self

    def build(self) -> str:
        """Build the final prompt."""
        parts = []

        # Visual scene
        if self.visual_elements:
            parts.append(" ".join(self.visual_elements))

        # Audio cues (Veo 3 understands these)
        if self.audio_elements:
            audio_desc = "Sounds include: " + ", ".join(self.audio_elements)
            parts.append(audio_desc)

        # Camera
        if self.camera:
            parts.append(f"Camera: {self.camera}")

        # Style
        if self.style:
            parts.append(f"Style: {self.style}")

        return ". ".join(parts)

# Example usage
prompt = (
    Veo3PromptBuilder()
    .add_visual("A busy Tokyo street at night")
    .add_visual("Neon signs reflect on wet pavement")
    .add_visual("People with umbrellas walk past")
    .add_sound("City ambiance")
    .add_sound("Rain on umbrellas")
    .add_sound("Distant traffic")
    .add_sound("Japanese pop music from a nearby store")
    .set_camera("Slow tracking shot following pedestrians")
    .set_style("Cinematic, blade runner aesthetic")
    .build()
)

YouTube Shorts Integration

Veo 3 is integrated into YouTube Create:

Code

YouTube Create App:
1. Open YouTube Create
2. Select "AI Video"
3. Enter prompt
4. Choose "Veo 3 Fast" for quick generation
5. Edit in timeline
6. Publish directly to Shorts

Features:
- SynthID watermarking (visible marker for AI content)
- Direct upload to channel
- Built-in editing tools
- Music library integration

Pricing

Plan	Price	Features
Google AI Pro	$20/month	1000 credits, watermarked
Google AI Ultra	$250/month	12500 credits, no watermark
API	Pay-per-use	$0.50/second generated

Runway Gen-4

Overview

Runway has been the creative professional's choice since Gen-1. Gen-4, along with their Aleph model, offers the most comprehensive editing toolkit alongside generation.

Key capabilities:

Up to 16-second clips
4K resolution
Advanced editing tools (inpainting, outpainting)
Motion brush for precise control
Multi-clip projects with transitions

Runway's Tool Suite

Runway isn't just a generator—it's a complete video AI platform:

Code

Generation Tools:
├── Gen-4 (Text-to-Video)
├── Gen-4 Turbo (Fast generation)
├── Aleph (Editing & transformation)
└── Image-to-Video

Editing Tools:
├── Inpainting (remove/replace objects)
├── Outpainting (extend frame)
├── Motion Brush (control movement)
├── Super Resolution (upscale)
├── Frame Interpolation (slow motion)
└── Background Removal

Audio Tools:
├── Audio Sync (lip sync)
├── Sound Effects
└── Music Generation

Using Runway API

Python

import runway

# Initialize client
client = runway.Client(api_key="YOUR_API_KEY")

# Text-to-Video
task = client.text_to_video.create(
    prompt="""
    A majestic eagle soars over snow-capped mountains.
    Golden hour lighting. Dramatic clouds in background.
    Camera follows the eagle in flight.
    """,
    model="gen4",
    duration=10,
    aspect_ratio="16:9",
    resolution="1080p"
)

# Wait for completion
result = task.wait()
print(f"Video URL: {result.url}")

# Download
video_bytes = client.download(result.url)
with open("eagle.mp4", "wb") as f:
    f.write(video_bytes)

Image-to-Video

Start from an image for more control:

Python

# Upload reference image
image = client.uploads.create(
    file=open("hero_character.png", "rb")
)

# Generate video from image
task = client.image_to_video.create(
    image=image.id,
    prompt="The character turns and walks toward the camera, confident stride",
    model="gen4",
    duration=8,
    motion_strength=0.7  # 0.0-1.0, higher = more motion
)

result = task.wait()

Motion Brush

Precise control over what moves:

Python

# Motion brush defines regions and their movement

motion_config = {
    "regions": [
        {
            "mask": "mask_clouds.png",  # Mask image
            "direction": "right",
            "speed": 0.3
        },
        {
            "mask": "mask_water.png",
            "direction": "oscillate",
            "speed": 0.5
        },
        {
            "mask": "mask_character.png",
            "direction": "forward",
            "speed": 0.8
        }
    ],
    "static_regions": ["mask_buildings.png"]  # These don't move
}

task = client.image_to_video.create(
    image=image.id,
    prompt="Scene comes to life",
    motion_config=motion_config,
    duration=8
)

Aleph: Advanced Editing

Runway's Aleph model specializes in video transformation:

Python

# Remove object from video
task = client.aleph.inpaint(
    video="input_video.mp4",
    mask="object_mask.mp4",  # Mask video marking object to remove
    prompt="Clean background, seamless removal"
)

# Style transfer
task = client.aleph.style_transfer(
    video="input_video.mp4",
    style_image="anime_style_reference.jpg",
    strength=0.8
)

# Extend video (outpainting in time)
task = client.aleph.extend(
    video="short_clip.mp4",
    direction="forward",  # or "backward"
    duration=5,  # seconds to add
    prompt="Continue the scene naturally"
)

Pricing

Plan	Price	Credits/month	Features
Free	$0	125 one-time	Watermarked
Standard	$15/month	625	Gen-4, no watermark
Pro	$35/month	2250	Priority, 4K
Unlimited	$95/month	Unlimited	All features

Kling 2.0

Overview

Kling (by Kuaishou) excels at long-form generation and realistic human faces. It's the go-to for character-driven content.

Key capabilities:

Up to 5 minutes per clip (industry-leading)
Excellent facial consistency
Lip sync with Kling Audio
Powerful motion control

Long-Form Generation

Python

# Kling handles long narratives
story_scenes = [
    {
        "prompt": "A young woman wakes up in a small apartment, morning sunlight streaming through curtains",
        "duration": 20
    },
    {
        "prompt": "She makes coffee, looking thoughtfully out the window at the city below",
        "duration": 15,
        "character_ref": "scene_1"  # Maintain character consistency
    },
    {
        "prompt": "Close-up of her face as she receives a surprising phone call",
        "duration": 10,
        "character_ref": "scene_1"
    },
    {
        "prompt": "She rushes to get ready, putting on a coat and grabbing her keys",
        "duration": 20,
        "character_ref": "scene_1"
    },
    {
        "prompt": "She runs down busy city streets, weaving through crowds",
        "duration": 25,
        "character_ref": "scene_1"
    }
]

# Total: 90 seconds of consistent narrative

Lip Sync with Kling Audio

Python

# Generate video with matching lip sync

# 1. Create the visual
video_task = kling.create_video(
    prompt="A news anchor delivers breaking news in a professional studio",
    duration=30,
    character_style="realistic",
    lip_sync_ready=True  # Prepares for audio overlay
)

# 2. Add voice and lip sync
audio_task = kling.add_audio(
    video_id=video_task.id,
    audio_source="tts",  # or "upload" for custom audio
    text="""
    Good evening. Tonight's top story: Scientists have made
    a breakthrough discovery that could change everything we
    know about renewable energy. Our correspondent has more.
    """,
    voice="news_anchor_female",
    lip_sync=True  # Adjusts mouth movements to match
)

Pricing

Plan	Price	Features
Free	$0	6 sec clips, watermarked
Standard	$8/month	30 sec, no watermark
Pro	$28/month	2 min clips
Enterprise	$66/month	5 min clips, API access

Practical Implementation

Video Generation Pipeline

Python

import os
from dataclasses import dataclass
from enum import Enum
from typing import Optional
import asyncio

class VideoProvider(Enum):
    SORA = "sora"
    VEO = "veo"
    RUNWAY = "runway"
    KLING = "kling"

@dataclass
class VideoRequest:
    prompt: str
    duration: int = 8
    resolution: str = "1080p"
    aspect_ratio: str = "16:9"
    style: Optional[str] = None
    audio: bool = False
    reference_image: Optional[str] = None

@dataclass
class VideoResult:
    url: str
    duration: float
    resolution: str
    has_audio: bool
    provider: VideoProvider
    cost: float

class VideoGenerator:
    """
    Unified interface for multiple video generation providers.
    """

    def __init__(self):
        self.providers = {}
        self._init_providers()

    def _init_providers(self):
        """Initialize available providers."""
        if os.getenv("OPENAI_API_KEY"):
            from openai import OpenAI
            self.providers[VideoProvider.SORA] = OpenAI()

        if os.getenv("GOOGLE_API_KEY"):
            import google.generativeai as genai
            genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))
            self.providers[VideoProvider.VEO] = genai

        if os.getenv("RUNWAY_API_KEY"):
            import runway
            self.providers[VideoProvider.RUNWAY] = runway.Client(
                api_key=os.getenv("RUNWAY_API_KEY")
            )

    def select_provider(self, request: VideoRequest) -> VideoProvider:
        """
        Select best provider based on request requirements.
        """
        # Need audio? Veo 3 is the only native option
        if request.audio and VideoProvider.VEO in self.providers:
            return VideoProvider.VEO

        # Long duration? Kling excels
        if request.duration > 30 and VideoProvider.KLING in self.providers:
            return VideoProvider.KLING

        # Need editing/control? Runway
        if request.reference_image and VideoProvider.RUNWAY in self.providers:
            return VideoProvider.RUNWAY

        # Default to Sora for quality
        if VideoProvider.SORA in self.providers:
            return VideoProvider.SORA

        # Fall back to whatever is available
        return list(self.providers.keys())[0]

    async def generate(
        self,
        request: VideoRequest,
        provider: Optional[VideoProvider] = None
    ) -> VideoResult:
        """Generate video using specified or auto-selected provider."""

        if provider is None:
            provider = self.select_provider(request)

        if provider == VideoProvider.SORA:
            return await self._generate_sora(request)
        elif provider == VideoProvider.VEO:
            return await self._generate_veo(request)
        elif provider == VideoProvider.RUNWAY:
            return await self._generate_runway(request)
        elif provider == VideoProvider.KLING:
            return await self._generate_kling(request)

    async def _generate_sora(self, request: VideoRequest) -> VideoResult:
        """Generate with Sora."""
        client = self.providers[VideoProvider.SORA]

        response = await client.video.generate(
            model="sora-2",
            prompt=request.prompt,
            duration=min(request.duration, 20),  # Sora max
            resolution=request.resolution
        )

        return VideoResult(
            url=response.url,
            duration=response.duration,
            resolution=request.resolution,
            has_audio=False,
            provider=VideoProvider.SORA,
            cost=self._calculate_cost(VideoProvider.SORA, response.duration)
        )

    async def _generate_veo(self, request: VideoRequest) -> VideoResult:
        """Generate with Veo 3."""
        genai = self.providers[VideoProvider.VEO]

        response = genai.generate_video(
            model="veo-3",
            prompt=request.prompt,
            config={
                "duration_seconds": min(request.duration, 8),
                "resolution": request.resolution,
                "generate_audio": request.audio
            }
        )

        return VideoResult(
            url=response.url,
            duration=response.duration,
            resolution=request.resolution,
            has_audio=request.audio,
            provider=VideoProvider.VEO,
            cost=self._calculate_cost(VideoProvider.VEO, response.duration)
        )

    async def _generate_runway(self, request: VideoRequest) -> VideoResult:
        """Generate with Runway Gen-4."""
        client = self.providers[VideoProvider.RUNWAY]

        if request.reference_image:
            task = client.image_to_video.create(
                image=request.reference_image,
                prompt=request.prompt,
                model="gen4",
                duration=min(request.duration, 16)
            )
        else:
            task = client.text_to_video.create(
                prompt=request.prompt,
                model="gen4",
                duration=min(request.duration, 16)
            )

        result = task.wait()

        return VideoResult(
            url=result.url,
            duration=result.duration,
            resolution=request.resolution,
            has_audio=False,
            provider=VideoProvider.RUNWAY,
            cost=self._calculate_cost(VideoProvider.RUNWAY, result.duration)
        )

    def _calculate_cost(self, provider: VideoProvider, duration: float) -> float:
        """Estimate cost for generation."""
        rates = {
            VideoProvider.SORA: 0.10,  # ~$0.10 per second
            VideoProvider.VEO: 0.50,   # $0.50 per second (API)
            VideoProvider.RUNWAY: 0.05, # ~$0.05 per second
            VideoProvider.KLING: 0.03   # ~$0.03 per second
        }
        return rates.get(provider, 0.10) * duration

# Usage example
async def main():
    generator = VideoGenerator()

    # Auto-select provider
    result = await generator.generate(VideoRequest(
        prompt="A serene Japanese garden with koi fish swimming in a pond",
        duration=10,
        audio=True  # Will select Veo 3
    ))

    print(f"Generated with {result.provider.value}")
    print(f"URL: {result.url}")
    print(f"Cost: ${result.cost:.2f}")

asyncio.run(main())

Batch Processing

Python

import asyncio
from typing import List

async def batch_generate(
    prompts: List[str],
    generator: VideoGenerator,
    max_concurrent: int = 3
) -> List[VideoResult]:
    """
    Generate multiple videos with concurrency control.
    """
    semaphore = asyncio.Semaphore(max_concurrent)

    async def generate_one(prompt: str) -> VideoResult:
        async with semaphore:
            return await generator.generate(VideoRequest(prompt=prompt))

    tasks = [generate_one(prompt) for prompt in prompts]
    return await asyncio.gather(*tasks)

# Example: Generate a video series
async def create_video_series():
    generator = VideoGenerator()

    prompts = [
        "Episode 1: A mysterious letter arrives at an old mansion",
        "Episode 2: The detective examines clues in the library",
        "Episode 3: A secret passage is discovered behind the bookshelf",
        "Episode 4: The truth is revealed in a dramatic confrontation"
    ]

    results = await batch_generate(prompts, generator)

    for i, result in enumerate(results):
        print(f"Episode {i+1}: {result.url}")

Adding Audio Post-Generation

For providers without native audio:

Python

from elevenlabs import ElevenLabs
import moviepy.editor as mpe

class AudioAdder:
    """Add audio to generated videos."""

    def __init__(self):
        self.elevenlabs = ElevenLabs(api_key=os.getenv("ELEVENLABS_API_KEY"))

    def add_narration(
        self,
        video_path: str,
        text: str,
        voice: str = "narrator",
        output_path: str = "output.mp4"
    ):
        """Add TTS narration to video."""

        # Generate speech
        audio = self.elevenlabs.generate(
            text=text,
            voice=voice,
            model="eleven_turbo_v2"
        )

        # Save temp audio
        with open("temp_audio.mp3", "wb") as f:
            f.write(audio)

        # Combine video and audio
        video = mpe.VideoFileClip(video_path)
        audio_clip = mpe.AudioFileClip("temp_audio.mp3")

        # Match durations
        if audio_clip.duration > video.duration:
            audio_clip = audio_clip.subclip(0, video.duration)

        final = video.set_audio(audio_clip)
        final.write_videofile(output_path)

        # Cleanup
        os.remove("temp_audio.mp3")

        return output_path

    def add_music(
        self,
        video_path: str,
        music_path: str,
        volume: float = 0.3,
        output_path: str = "output.mp4"
    ):
        """Add background music to video."""

        video = mpe.VideoFileClip(video_path)
        music = mpe.AudioFileClip(music_path)

        # Loop music if needed
        if music.duration < video.duration:
            music = mpe.concatenate_audioclips([music] * int(video.duration / music.duration + 1))

        # Trim and adjust volume
        music = music.subclip(0, video.duration).volumex(volume)

        # Mix with original audio if exists
        if video.audio:
            final_audio = mpe.CompositeAudioClip([video.audio, music])
        else:
            final_audio = music

        final = video.set_audio(final_audio)
        final.write_videofile(output_path)

        return output_path

# Usage
audio_adder = AudioAdder()

# Add narration to Sora-generated video
audio_adder.add_narration(
    video_path="sora_output.mp4",
    text="In a world where dreams become reality, one person dared to imagine the impossible...",
    voice="dramatic_narrator"
)

Best Practices

Prompt Engineering Tips

Python

# DO: Be specific about visual details
good_prompt = """
A golden retriever puppy sits in a sun-dappled garden.
Soft afternoon light filters through oak leaves.
The puppy tilts its head curiously, ears perked up.
Shallow depth of field, background softly blurred.
Shot on 35mm film, warm color grading.
"""

# DON'T: Be vague or contradictory
bad_prompt = """
A dog in a nice place looking cute.
"""

# DO: Describe camera movement explicitly
good_camera = """
Camera slowly dollies forward while tilting up,
revealing the full height of the ancient redwood tree.
"""

# DO: Include temporal descriptions
good_temporal = """
The flower blooms in accelerated time-lapse,
petals unfurling one by one over 5 seconds.
"""

Common Issues and Solutions

Issue	Solution
Inconsistent characters	Use image-to-video with reference
Unnatural motion	Reduce motion strength, be specific
Bad hands/faces	Use models optimized for humans (Kling)
Wrong aspect ratio	Specify explicitly in prompt
Artifacts	Try different seed, reduce duration

Conclusion

AI video generation in 2025 has reached production quality. The choice of tool depends on your specific needs:

Sora 2: Best overall quality, storytelling
Veo 3: Only option for native audio
Runway: Best editing tools, creative control
Kling: Best for long-form and faces

For most projects, you'll likely use multiple tools—Sora or Veo for generation, Runway for editing, and external tools for audio when needed.

Table of Contents

The Video Generation Revolution

Quick Comparison

OpenAI Sora 2

Overview

Technical Details

Using Sora

Web Interface (ChatGPT)

Prompt Engineering for Sora

Sora Storyboard Mode

Disney Character Access

Pricing

Limitations

Google Veo 3

Overview

Native Audio Generation

Using Veo 3

Google AI Studio

Veo 3 Fast Mode

Prompt Techniques for Veo 3

YouTube Shorts Integration

Pricing

Runway Gen-4

Overview

Runway's Tool Suite

Using Runway API

Image-to-Video

Motion Brush

Aleph: Advanced Editing

Pricing

Kling 2.0

Overview

Long-Form Generation

Lip Sync with Kling Audio

Pricing

Practical Implementation

Video Generation Pipeline

Batch Processing

Adding Audio Post-Generation

Best Practices

Prompt Engineering Tips

Common Issues and Solutions

Conclusion

Frequently Asked Questions

Enrico Piovano, PhD

Related Articles

Multimodal LLMs: Vision, Audio, and Beyond

Open-Source LLMs: The Complete 2025 Guide

AI Applications by Industry: The 2025 Vertical Landscape