Video Generation AI 2025: Sora 2 vs Veo 3 vs Runway Complete Guide
A comprehensive guide to AI video generation in 2025—Sora 2, Veo 3, Runway Gen-4, Kling, and more. Capabilities, pricing, API access, and practical implementation.
Table of Contents
The Video Generation Revolution
2025 has been the breakthrough year for AI video generation. What was impossible in 2023 is now routine: generating photorealistic videos from text prompts, with consistent characters, physics simulation, and even native audio.
Why video generation is fundamentally harder than image generation: Images are static—each pixel is independent of time. Video adds a temporal dimension: objects must move consistently, physics must be respected, and visual coherence must persist across hundreds of frames. A minor glitch that would be invisible in a single image becomes jarring when repeated across 30 frames per second. This is why video generation lagged image generation by 2+ years despite similar underlying architectures.
The commercial inflection point: What changed in 2025 wasn't just quality—it was consistency and controllability. Early models produced impressive clips but couldn't maintain character identity, follow physics accurately, or handle complex prompts reliably. The new generation (Sora 2, Veo 3) achieves "production quality" for many use cases: ads, social content, prototyping. The Disney deal signals that Hollywood sees these tools as complementary to human creators, not replacements.
The scale of investment tells the story: Disney's $1 billion deal with OpenAI for Sora 2 character rights, Google's Veo 3 integration into YouTube Shorts, and Runway powering Hollywood productions.
This guide covers everything you need to know about AI video generation in 2025—from consumer tools to API integration.
Quick Comparison
| Tool | Max Length | Resolution | Audio | Price | Best For |
|---|---|---|---|---|---|
| Sora 2 | 60 seconds | 1080p | No | $20-200/month | Realism, storytelling |
| Veo 3 | 8 seconds | 4K | Native | $20-250/month | Audio sync, YouTube |
| Runway Gen-4 | 16 seconds | 4K | Add-on | $15-95/month | Editing, filmmaking |
| Kling 2.0 | 5 minutes | 1080p | Via Kling Audio | Free-$66/month | Long-form, faces |
| Minimax Hailuo | 6 seconds | 720p | No | Free | Quick experiments |
| Pika 2.0 | 15 seconds | 1080p | Yes | $8-58/month | Motion effects |
OpenAI Sora 2
Overview
Sora 2, released September 2025, is OpenAI's second-generation video model. It produces the most photorealistic videos with remarkable understanding of physics and object permanence.
Key capabilities:
- Up to 60-second clips at 1080p
- Text-to-video and image-to-video
- Excellent physics simulation
- Character consistency across scenes
- Support for 200+ Disney characters (licensed)
Technical Details
Sora uses a diffusion transformer architecture:
Text Prompt → Text Encoder → Diffusion Transformer → Video Frames
↑
Noise Schedule
(iterative denoising)
Architecture highlights:
- Spacetime patches (3D video tokens)
- Variable duration, resolution, aspect ratio
- Trained on internet-scale video data
- Recaptioning with detailed descriptions
Using Sora
Web Interface (ChatGPT)
Prompt: "A serene underwater scene of a coral reef. Colorful
tropical fish swim lazily through crystal-clear water. Sunlight
filters down from the surface, creating dancing light patterns
on the sandy bottom. A sea turtle glides gracefully through the
frame from left to right."
Settings:
- Duration: 10 seconds
- Aspect ratio: 16:9
- Resolution: 1080p (Pro users)
Prompt Engineering for Sora
Why video prompts differ from image prompts: Image prompts describe a static scene—composition, lighting, style. Video prompts must also describe motion: what moves, how fast, in what direction. They need camera instructions: is the camera static, panning, tracking a subject? And they need temporal structure: does the scene evolve? The best video prompts read like mini screenplays, specifying what the viewer should experience over time.
The anatomy of an effective video prompt: Top-performing prompts have five elements: (1) scene description (what's visible), (2) camera movement (how we view it), (3) visual style (aesthetic treatment), (4) mood (emotional tone), and (5) motion description (what happens over time). Missing any element leaves the model to guess—and guesses create inconsistency.
def create_sora_prompt(
scene: str,
camera: str = None,
style: str = None,
mood: str = None,
motion: str = None
) -> str:
"""
Structure prompts for best Sora results.
Key elements:
1. Scene description (what's in frame)
2. Camera movement (if any)
3. Visual style
4. Mood/atmosphere
5. Motion/action
"""
prompt_parts = []
# Scene is required
prompt_parts.append(scene)
# Add camera movement
if camera:
camera_terms = {
"static": "Static camera shot",
"pan_left": "Camera slowly pans from right to left",
"pan_right": "Camera slowly pans from left to right",
"zoom_in": "Camera gradually zooms in",
"zoom_out": "Camera pulls back slowly",
"dolly": "Camera moves forward through the scene",
"crane": "Camera rises upward revealing the scene",
"handheld": "Slight handheld camera movement",
"drone": "Aerial drone shot moving forward",
"tracking": "Camera tracks alongside the subject"
}
prompt_parts.append(camera_terms.get(camera, camera))
# Add style
if style:
style_terms = {
"cinematic": "Cinematic quality, film grain, dramatic lighting",
"documentary": "Documentary style, natural lighting",
"anime": "Anime style animation",
"photorealistic": "Photorealistic, high detail",
"vintage": "Vintage film look, warm colors, soft focus",
"noir": "Film noir style, high contrast, dramatic shadows"
}
prompt_parts.append(style_terms.get(style, style))
# Add mood
if mood:
prompt_parts.append(f"The mood is {mood}")
# Add motion description
if motion:
prompt_parts.append(motion)
return ". ".join(prompt_parts)
# Example usage
prompt = create_sora_prompt(
scene="A lone astronaut walks across the surface of Mars, red dust swirling around their boots",
camera="tracking",
style="cinematic",
mood="isolated and contemplative",
motion="The astronaut moves slowly, deliberately, pausing to look at the horizon"
)
print(prompt)
# Output: "A lone astronaut walks across the surface of Mars, red dust
# swirling around their boots. Camera tracks alongside the subject.
# Cinematic quality, film grain, dramatic lighting. The mood is isolated
# and contemplative. The astronaut moves slowly, deliberately, pausing
# to look at the horizon."
Sora Storyboard Mode
Create connected scenes:
storyboard = [
{
"scene": 1,
"prompt": "A woman sits alone at a cafe table in Paris, looking pensively out the window at the rain",
"duration": 8,
"camera": "static"
},
{
"scene": 2,
"prompt": "Close-up of the woman's hands holding a coffee cup, rain visible through window reflection",
"duration": 5,
"camera": "static",
"transition": "cut"
},
{
"scene": 3,
"prompt": "The woman stands and walks toward the cafe door, putting on her coat",
"duration": 7,
"camera": "tracking",
"transition": "dissolve"
},
{
"scene": 4,
"prompt": "Wide shot of the woman stepping out into the rainy Paris street, Eiffel Tower visible in distance",
"duration": 10,
"camera": "crane",
"transition": "cut"
}
]
# Character consistency is maintained across scenes through:
# 1. Consistent character descriptions
# 2. Reference images (image-to-video for first scene)
# 3. Sora's internal character tracking
Disney Character Access
With the Disney partnership:
# Licensed characters available:
- Disney Animation: Mickey, Elsa, Moana, etc.
- Pixar: Woody, Buzz, Nemo, etc.
- Marvel: Iron Man, Spider-Man, etc.
- Star Wars: Darth Vader, Yoda, etc.
# Prompt example:
"Buzz Lightyear flying through space, stars streaking past,
dramatic lighting, Pixar animation style"
Pricing
| Plan | Price | Features |
|---|---|---|
| ChatGPT Plus | $20/month | 720p, 5-sec limit, 50 videos/month |
| ChatGPT Pro | $200/month | 1080p, 20-sec limit, unlimited |
Limitations
- No audio generation (must add separately)
- Sometimes struggles with text in videos
- Human hands and complex interactions can be glitchy
- Generation time: 30 seconds to 5 minutes per clip
Google Veo 3
Overview
Veo 3, released December 2025, is Google's flagship video model. Its standout feature is native audio generation—synchronized sound effects, ambient audio, and even music.
Key capabilities:
- Up to 8-second clips (extendable via stitching)
- Native 4K resolution
- Native audio generation (unique feature)
- YouTube Shorts integration
- Veo 3 Fast mode for quick iteration
Native Audio Generation
Veo 3's killer feature is synchronized audio:
# Veo 3 generates both video AND matching audio
prompt = """
A thunderstorm over a mountain lake. Lightning illuminates
the peaks. Rain falls heavily on the water surface.
Thunder rumbles in the distance.
"""
# Output includes:
# - Video: The visual scene
# - Audio: Rain sounds, thunder, ambient wind
# - Perfectly synchronized to visual events
Using Veo 3
Google AI Studio
import google.generativeai as genai
from google.generativeai import types
# Configure
genai.configure(api_key="YOUR_API_KEY")
# Generate video
response = genai.generate_video(
model="veo-3",
prompt="""
A cozy coffee shop interior. Soft jazz plays in the background.
Steam rises from a freshly poured latte. Rain patters against
the window. A barista moves in the background, preparing drinks.
""",
config=types.GenerateVideoConfig(
duration_seconds=8,
aspect_ratio="16:9",
resolution="1080p",
generate_audio=True,
audio_style="ambient"
)
)
# Download the generated video
with open("coffee_shop.mp4", "wb") as f:
f.write(response.video_bytes)
print(f"Video generated: {response.duration}s")
print(f"Audio included: {response.has_audio}")
Veo 3 Fast Mode
For quick iterations (YouTube Shorts):
# Fast mode: ~10 seconds generation, lower quality
response = genai.generate_video(
model="veo-3-fast",
prompt="A cute puppy running through autumn leaves",
config=types.GenerateVideoConfig(
duration_seconds=8,
fast_mode=True # Enables Veo 3 Fast
)
)
Prompt Techniques for Veo 3
class Veo3PromptBuilder:
"""Builder for optimized Veo 3 prompts."""
def __init__(self):
self.visual_elements = []
self.audio_elements = []
self.camera = None
self.style = None
def add_visual(self, element: str):
"""Add visual element to scene."""
self.visual_elements.append(element)
return self
def add_sound(self, sound: str):
"""Add sound element (Veo 3 will generate matching audio)."""
self.audio_elements.append(sound)
return self
def set_camera(self, movement: str):
"""Set camera movement."""
self.camera = movement
return self
def set_style(self, style: str):
"""Set visual style."""
self.style = style
return self
def build(self) -> str:
"""Build the final prompt."""
parts = []
# Visual scene
if self.visual_elements:
parts.append(" ".join(self.visual_elements))
# Audio cues (Veo 3 understands these)
if self.audio_elements:
audio_desc = "Sounds include: " + ", ".join(self.audio_elements)
parts.append(audio_desc)
# Camera
if self.camera:
parts.append(f"Camera: {self.camera}")
# Style
if self.style:
parts.append(f"Style: {self.style}")
return ". ".join(parts)
# Example usage
prompt = (
Veo3PromptBuilder()
.add_visual("A busy Tokyo street at night")
.add_visual("Neon signs reflect on wet pavement")
.add_visual("People with umbrellas walk past")
.add_sound("City ambiance")
.add_sound("Rain on umbrellas")
.add_sound("Distant traffic")
.add_sound("Japanese pop music from a nearby store")
.set_camera("Slow tracking shot following pedestrians")
.set_style("Cinematic, blade runner aesthetic")
.build()
)
YouTube Shorts Integration
Veo 3 is integrated into YouTube Create:
YouTube Create App:
1. Open YouTube Create
2. Select "AI Video"
3. Enter prompt
4. Choose "Veo 3 Fast" for quick generation
5. Edit in timeline
6. Publish directly to Shorts
Features:
- SynthID watermarking (visible marker for AI content)
- Direct upload to channel
- Built-in editing tools
- Music library integration
Pricing
| Plan | Price | Features |
|---|---|---|
| Google AI Pro | $20/month | 1000 credits, watermarked |
| Google AI Ultra | $250/month | 12500 credits, no watermark |
| API | Pay-per-use | $0.50/second generated |
Runway Gen-4
Overview
Runway has been the creative professional's choice since Gen-1. Gen-4, along with their Aleph model, offers the most comprehensive editing toolkit alongside generation.
Key capabilities:
- Up to 16-second clips
- 4K resolution
- Advanced editing tools (inpainting, outpainting)
- Motion brush for precise control
- Multi-clip projects with transitions
Runway's Tool Suite
Runway isn't just a generator—it's a complete video AI platform:
Generation Tools:
├── Gen-4 (Text-to-Video)
├── Gen-4 Turbo (Fast generation)
├── Aleph (Editing & transformation)
└── Image-to-Video
Editing Tools:
├── Inpainting (remove/replace objects)
├── Outpainting (extend frame)
├── Motion Brush (control movement)
├── Super Resolution (upscale)
├── Frame Interpolation (slow motion)
└── Background Removal
Audio Tools:
├── Audio Sync (lip sync)
├── Sound Effects
└── Music Generation
Using Runway API
import runway
# Initialize client
client = runway.Client(api_key="YOUR_API_KEY")
# Text-to-Video
task = client.text_to_video.create(
prompt="""
A majestic eagle soars over snow-capped mountains.
Golden hour lighting. Dramatic clouds in background.
Camera follows the eagle in flight.
""",
model="gen4",
duration=10,
aspect_ratio="16:9",
resolution="1080p"
)
# Wait for completion
result = task.wait()
print(f"Video URL: {result.url}")
# Download
video_bytes = client.download(result.url)
with open("eagle.mp4", "wb") as f:
f.write(video_bytes)
Image-to-Video
Start from an image for more control:
# Upload reference image
image = client.uploads.create(
file=open("hero_character.png", "rb")
)
# Generate video from image
task = client.image_to_video.create(
image=image.id,
prompt="The character turns and walks toward the camera, confident stride",
model="gen4",
duration=8,
motion_strength=0.7 # 0.0-1.0, higher = more motion
)
result = task.wait()
Motion Brush
Precise control over what moves:
# Motion brush defines regions and their movement
motion_config = {
"regions": [
{
"mask": "mask_clouds.png", # Mask image
"direction": "right",
"speed": 0.3
},
{
"mask": "mask_water.png",
"direction": "oscillate",
"speed": 0.5
},
{
"mask": "mask_character.png",
"direction": "forward",
"speed": 0.8
}
],
"static_regions": ["mask_buildings.png"] # These don't move
}
task = client.image_to_video.create(
image=image.id,
prompt="Scene comes to life",
motion_config=motion_config,
duration=8
)
Aleph: Advanced Editing
Runway's Aleph model specializes in video transformation:
# Remove object from video
task = client.aleph.inpaint(
video="input_video.mp4",
mask="object_mask.mp4", # Mask video marking object to remove
prompt="Clean background, seamless removal"
)
# Style transfer
task = client.aleph.style_transfer(
video="input_video.mp4",
style_image="anime_style_reference.jpg",
strength=0.8
)
# Extend video (outpainting in time)
task = client.aleph.extend(
video="short_clip.mp4",
direction="forward", # or "backward"
duration=5, # seconds to add
prompt="Continue the scene naturally"
)
Pricing
| Plan | Price | Credits/month | Features |
|---|---|---|---|
| Free | $0 | 125 one-time | Watermarked |
| Standard | $15/month | 625 | Gen-4, no watermark |
| Pro | $35/month | 2250 | Priority, 4K |
| Unlimited | $95/month | Unlimited | All features |
Kling 2.0
Overview
Kling (by Kuaishou) excels at long-form generation and realistic human faces. It's the go-to for character-driven content.
Key capabilities:
- Up to 5 minutes per clip (industry-leading)
- Excellent facial consistency
- Lip sync with Kling Audio
- Powerful motion control
Long-Form Generation
# Kling handles long narratives
story_scenes = [
{
"prompt": "A young woman wakes up in a small apartment, morning sunlight streaming through curtains",
"duration": 20
},
{
"prompt": "She makes coffee, looking thoughtfully out the window at the city below",
"duration": 15,
"character_ref": "scene_1" # Maintain character consistency
},
{
"prompt": "Close-up of her face as she receives a surprising phone call",
"duration": 10,
"character_ref": "scene_1"
},
{
"prompt": "She rushes to get ready, putting on a coat and grabbing her keys",
"duration": 20,
"character_ref": "scene_1"
},
{
"prompt": "She runs down busy city streets, weaving through crowds",
"duration": 25,
"character_ref": "scene_1"
}
]
# Total: 90 seconds of consistent narrative
Lip Sync with Kling Audio
# Generate video with matching lip sync
# 1. Create the visual
video_task = kling.create_video(
prompt="A news anchor delivers breaking news in a professional studio",
duration=30,
character_style="realistic",
lip_sync_ready=True # Prepares for audio overlay
)
# 2. Add voice and lip sync
audio_task = kling.add_audio(
video_id=video_task.id,
audio_source="tts", # or "upload" for custom audio
text="""
Good evening. Tonight's top story: Scientists have made
a breakthrough discovery that could change everything we
know about renewable energy. Our correspondent has more.
""",
voice="news_anchor_female",
lip_sync=True # Adjusts mouth movements to match
)
Pricing
| Plan | Price | Features |
|---|---|---|
| Free | $0 | 6 sec clips, watermarked |
| Standard | $8/month | 30 sec, no watermark |
| Pro | $28/month | 2 min clips |
| Enterprise | $66/month | 5 min clips, API access |
Practical Implementation
Video Generation Pipeline
import os
from dataclasses import dataclass
from enum import Enum
from typing import Optional
import asyncio
class VideoProvider(Enum):
SORA = "sora"
VEO = "veo"
RUNWAY = "runway"
KLING = "kling"
@dataclass
class VideoRequest:
prompt: str
duration: int = 8
resolution: str = "1080p"
aspect_ratio: str = "16:9"
style: Optional[str] = None
audio: bool = False
reference_image: Optional[str] = None
@dataclass
class VideoResult:
url: str
duration: float
resolution: str
has_audio: bool
provider: VideoProvider
cost: float
class VideoGenerator:
"""
Unified interface for multiple video generation providers.
"""
def __init__(self):
self.providers = {}
self._init_providers()
def _init_providers(self):
"""Initialize available providers."""
if os.getenv("OPENAI_API_KEY"):
from openai import OpenAI
self.providers[VideoProvider.SORA] = OpenAI()
if os.getenv("GOOGLE_API_KEY"):
import google.generativeai as genai
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))
self.providers[VideoProvider.VEO] = genai
if os.getenv("RUNWAY_API_KEY"):
import runway
self.providers[VideoProvider.RUNWAY] = runway.Client(
api_key=os.getenv("RUNWAY_API_KEY")
)
def select_provider(self, request: VideoRequest) -> VideoProvider:
"""
Select best provider based on request requirements.
"""
# Need audio? Veo 3 is the only native option
if request.audio and VideoProvider.VEO in self.providers:
return VideoProvider.VEO
# Long duration? Kling excels
if request.duration > 30 and VideoProvider.KLING in self.providers:
return VideoProvider.KLING
# Need editing/control? Runway
if request.reference_image and VideoProvider.RUNWAY in self.providers:
return VideoProvider.RUNWAY
# Default to Sora for quality
if VideoProvider.SORA in self.providers:
return VideoProvider.SORA
# Fall back to whatever is available
return list(self.providers.keys())[0]
async def generate(
self,
request: VideoRequest,
provider: Optional[VideoProvider] = None
) -> VideoResult:
"""Generate video using specified or auto-selected provider."""
if provider is None:
provider = self.select_provider(request)
if provider == VideoProvider.SORA:
return await self._generate_sora(request)
elif provider == VideoProvider.VEO:
return await self._generate_veo(request)
elif provider == VideoProvider.RUNWAY:
return await self._generate_runway(request)
elif provider == VideoProvider.KLING:
return await self._generate_kling(request)
async def _generate_sora(self, request: VideoRequest) -> VideoResult:
"""Generate with Sora."""
client = self.providers[VideoProvider.SORA]
response = await client.video.generate(
model="sora-2",
prompt=request.prompt,
duration=min(request.duration, 20), # Sora max
resolution=request.resolution
)
return VideoResult(
url=response.url,
duration=response.duration,
resolution=request.resolution,
has_audio=False,
provider=VideoProvider.SORA,
cost=self._calculate_cost(VideoProvider.SORA, response.duration)
)
async def _generate_veo(self, request: VideoRequest) -> VideoResult:
"""Generate with Veo 3."""
genai = self.providers[VideoProvider.VEO]
response = genai.generate_video(
model="veo-3",
prompt=request.prompt,
config={
"duration_seconds": min(request.duration, 8),
"resolution": request.resolution,
"generate_audio": request.audio
}
)
return VideoResult(
url=response.url,
duration=response.duration,
resolution=request.resolution,
has_audio=request.audio,
provider=VideoProvider.VEO,
cost=self._calculate_cost(VideoProvider.VEO, response.duration)
)
async def _generate_runway(self, request: VideoRequest) -> VideoResult:
"""Generate with Runway Gen-4."""
client = self.providers[VideoProvider.RUNWAY]
if request.reference_image:
task = client.image_to_video.create(
image=request.reference_image,
prompt=request.prompt,
model="gen4",
duration=min(request.duration, 16)
)
else:
task = client.text_to_video.create(
prompt=request.prompt,
model="gen4",
duration=min(request.duration, 16)
)
result = task.wait()
return VideoResult(
url=result.url,
duration=result.duration,
resolution=request.resolution,
has_audio=False,
provider=VideoProvider.RUNWAY,
cost=self._calculate_cost(VideoProvider.RUNWAY, result.duration)
)
def _calculate_cost(self, provider: VideoProvider, duration: float) -> float:
"""Estimate cost for generation."""
rates = {
VideoProvider.SORA: 0.10, # ~$0.10 per second
VideoProvider.VEO: 0.50, # $0.50 per second (API)
VideoProvider.RUNWAY: 0.05, # ~$0.05 per second
VideoProvider.KLING: 0.03 # ~$0.03 per second
}
return rates.get(provider, 0.10) * duration
# Usage example
async def main():
generator = VideoGenerator()
# Auto-select provider
result = await generator.generate(VideoRequest(
prompt="A serene Japanese garden with koi fish swimming in a pond",
duration=10,
audio=True # Will select Veo 3
))
print(f"Generated with {result.provider.value}")
print(f"URL: {result.url}")
print(f"Cost: ${result.cost:.2f}")
asyncio.run(main())
Batch Processing
import asyncio
from typing import List
async def batch_generate(
prompts: List[str],
generator: VideoGenerator,
max_concurrent: int = 3
) -> List[VideoResult]:
"""
Generate multiple videos with concurrency control.
"""
semaphore = asyncio.Semaphore(max_concurrent)
async def generate_one(prompt: str) -> VideoResult:
async with semaphore:
return await generator.generate(VideoRequest(prompt=prompt))
tasks = [generate_one(prompt) for prompt in prompts]
return await asyncio.gather(*tasks)
# Example: Generate a video series
async def create_video_series():
generator = VideoGenerator()
prompts = [
"Episode 1: A mysterious letter arrives at an old mansion",
"Episode 2: The detective examines clues in the library",
"Episode 3: A secret passage is discovered behind the bookshelf",
"Episode 4: The truth is revealed in a dramatic confrontation"
]
results = await batch_generate(prompts, generator)
for i, result in enumerate(results):
print(f"Episode {i+1}: {result.url}")
Adding Audio Post-Generation
For providers without native audio:
from elevenlabs import ElevenLabs
import moviepy.editor as mpe
class AudioAdder:
"""Add audio to generated videos."""
def __init__(self):
self.elevenlabs = ElevenLabs(api_key=os.getenv("ELEVENLABS_API_KEY"))
def add_narration(
self,
video_path: str,
text: str,
voice: str = "narrator",
output_path: str = "output.mp4"
):
"""Add TTS narration to video."""
# Generate speech
audio = self.elevenlabs.generate(
text=text,
voice=voice,
model="eleven_turbo_v2"
)
# Save temp audio
with open("temp_audio.mp3", "wb") as f:
f.write(audio)
# Combine video and audio
video = mpe.VideoFileClip(video_path)
audio_clip = mpe.AudioFileClip("temp_audio.mp3")
# Match durations
if audio_clip.duration > video.duration:
audio_clip = audio_clip.subclip(0, video.duration)
final = video.set_audio(audio_clip)
final.write_videofile(output_path)
# Cleanup
os.remove("temp_audio.mp3")
return output_path
def add_music(
self,
video_path: str,
music_path: str,
volume: float = 0.3,
output_path: str = "output.mp4"
):
"""Add background music to video."""
video = mpe.VideoFileClip(video_path)
music = mpe.AudioFileClip(music_path)
# Loop music if needed
if music.duration < video.duration:
music = mpe.concatenate_audioclips([music] * int(video.duration / music.duration + 1))
# Trim and adjust volume
music = music.subclip(0, video.duration).volumex(volume)
# Mix with original audio if exists
if video.audio:
final_audio = mpe.CompositeAudioClip([video.audio, music])
else:
final_audio = music
final = video.set_audio(final_audio)
final.write_videofile(output_path)
return output_path
# Usage
audio_adder = AudioAdder()
# Add narration to Sora-generated video
audio_adder.add_narration(
video_path="sora_output.mp4",
text="In a world where dreams become reality, one person dared to imagine the impossible...",
voice="dramatic_narrator"
)
Best Practices
Prompt Engineering Tips
# DO: Be specific about visual details
good_prompt = """
A golden retriever puppy sits in a sun-dappled garden.
Soft afternoon light filters through oak leaves.
The puppy tilts its head curiously, ears perked up.
Shallow depth of field, background softly blurred.
Shot on 35mm film, warm color grading.
"""
# DON'T: Be vague or contradictory
bad_prompt = """
A dog in a nice place looking cute.
"""
# DO: Describe camera movement explicitly
good_camera = """
Camera slowly dollies forward while tilting up,
revealing the full height of the ancient redwood tree.
"""
# DO: Include temporal descriptions
good_temporal = """
The flower blooms in accelerated time-lapse,
petals unfurling one by one over 5 seconds.
"""
Common Issues and Solutions
| Issue | Solution |
|---|---|
| Inconsistent characters | Use image-to-video with reference |
| Unnatural motion | Reduce motion strength, be specific |
| Bad hands/faces | Use models optimized for humans (Kling) |
| Wrong aspect ratio | Specify explicitly in prompt |
| Artifacts | Try different seed, reduce duration |
Conclusion
AI video generation in 2025 has reached production quality. The choice of tool depends on your specific needs:
- Sora 2: Best overall quality, storytelling
- Veo 3: Only option for native audio
- Runway: Best editing tools, creative control
- Kling: Best for long-form and faces
For most projects, you'll likely use multiple tools—Sora or Veo for generation, Runway for editing, and external tools for audio when needed.
Frequently Asked Questions
Related Articles
Multimodal LLMs: Vision, Audio, and Beyond
A comprehensive guide to multimodal LLMs—vision-language models, audio understanding, video comprehension, and any-to-any models. Architecture deep dives, benchmarks, implementation patterns, and production deployment.
Open-Source LLMs: The Complete 2025 Guide
A comprehensive guide to open-source LLMs—Llama 4, Qwen3, DeepSeek V3.2, Mistral Large 3, Kimi K2, GLM-4.7 and more. Detailed benchmarks, hardware requirements, deployment strategies, and practical recommendations for production use.
AI Applications by Industry: The 2025 Vertical Landscape
A comprehensive guide to AI applications across industries—healthcare, legal, finance, coding, sales, and more. Top companies, market sizes, use cases, and technical approaches for each vertical.