Overview
VideoGenerationTool supports two main workflows:- Video Generation: Create new videos from text/images using Veo 3.1
- Video Editing: Edit existing videos using Kling AI (face swap, style transfer, motion control)
Video generation uses Google’s Veo 3.1 models via Vertex AI. Video editing uses Kling AI models. Videos are generated in high quality and saved to your workflow output.
Video Generation (generate_video)
Create new videos from text descriptions and reference images using Veo 3.1.Models
Veo 3.1 (Quality)
- Aspect Ratios: 16:9 (landscape), 9:16 (portrait)
- Resolutions: 720p, 1080p
- Durations: 4, 6, 8, 15, 22, 29, 36, 43, 50, 57, 64 seconds
- Audio: Native audio generation supported
- Image-to-Video: Supported
- Extension: Auto-extension for videos >8s (up to 64s)
- Use Case: High-quality video generation
Veo 3.1 Fast
- Aspect Ratios: 16:9 (landscape), 9:16 (portrait)
- Resolutions: 720p, 1080p
- Durations: 4, 6, 8, 15, 22, 29, 36, 43, 50, 57, 64 seconds
- Audio: Native audio generation supported
- Image-to-Video: Supported
- Extension: Auto-extension for videos >8s (up to 64s)
- Use Case: Speed-optimized video generation
Configuration
Video generation usesvideo_generation_config with these fields:
Prompt Guide (HOW the video is made) - All Optional
cinematography: Camera movement, shot types, angleslighting: Light sources, quality, directionsound_style: Ambient sounds, music descriptionvisual_style: Aesthetic, mood, style references
Technical Settings
model: Required - “veo-3.1-generate-preview” or “veo-3.1-fast-generate-preview”aspect_ratio: Required - “16:9” or “9:16”resolution: Required - “720p” or “1080p” (1080p limited to 8s max)duration_seconds: Required - One of: 4, 6, 8, 15, 22, 29, 36, 43, 50, 57, 64enable_audio: Required - true/false for native audio generationnegative_prompt: Optional - Elements to exclude from generationfirst_frame: Optional - Path to image for first frame (interpolation)last_frame: Optional - Path to image for last frame (requires first_frame)reference_images: Optional - List of reference images (max 3) with type designation
Input Structure
Scenes are passed separately from config viainput parameter:
visual(required): Description of what is seendialogue(optional): Spoken words for that scenesound_effects(optional): Per-scene sound effects
Reference Images
path: Path to the image filereference_type: “asset” (object/character/product) or “style” (aesthetics)- Portrait mode (9:16) does not support reference_images
Example: Text-to-Video
Example: Image-to-Video with Reference
Video Editing (edit_video)
Edit existing videos or generate with character/motion control using Kling AI.Models
Kling VIDEO O1
- Workflow: Video editing (face swap, style transfer)
- Input Video: Required (3-10 seconds, max 32MB)
- Reference Images: Up to 4 images
- Aspect Ratios: 16:9, 1:1, 9:16
- Use Case: Edit existing videos with face swap or style transfer
Kling VIDEO 2.6 Pro
- Workflow: Motion-control (follow motion from reference video)
- Reference Video: Required (3-30s, max 100MB)
- Reference Images: Required (1-7 images) - character/subject for video
- Output Duration: 5 or 10 seconds
- Aspect Ratios: 16:9, 1:1, 9:16
- Voice Cloning: Supported (5-30s audio files)
- Use Case: Generate video following motion from reference video
Configuration
Video editing usesvideo_editing_config with these fields:
Common Settings
model: Required - “kling-video-o1” or “kling-video-2.6-pro”aspect_ratio: Required - “16:9”, “1:1”, or “9:16”keep_original_sound: Optional - Preserve audio from input (default: true)
Video Inputs
input_video: Required for O1 - Source video for editing (3-10s, max 32MB)reference_video: Required for 2.6 Pro - Motion reference (3-30s, max 100MB)reference_images: O1: max 4, 2.6 Pro: max 7 - Character/object reference
2.6 Pro-Specific Settings
cfg_scale: Default 0.5 - Prompt adherence (0-1)duration_seconds: Default 5 - Output duration (5 or 10)sound: Default true - Enable audio generationnegative_prompt: Optional - Elements to exclude (2-2500 chars)character_orientation: Optional - “image” or “video” - prioritize ref images vs ref videoreference_voices: Optional - Voice cloning audio files
Image Reference Syntax
Prompts can reference images using placeholders. The system automatically indexes reference images, allowing you to reference them by their position in the reference_images array.Reference images are indexed starting from 0. When you provide reference_images, they can be referenced in your prompts by their position in the array.
Example: Face Swap (O1)
Example: Motion Control (2.6 Pro)
Credit Costs
Video generation and editing are credit-intensive operations:Video Generation (Veo 3.1)
- Starting cost: 1000 credits per generation
- Cost may vary by duration and model in the tool configuration
Video Editing (Kling AI)
- Starting cost: 600 credits per edit
- Cost may vary by duration and model in the tool configuration
Performance
- Video Generation: 11 seconds to 6 minutes per segment
- Video Editing: 1-5 minutes per operation
- Videos >8s: Use auto-extension for generation
- 1080p resolution: Limited to 8s max for generation
Limitations
Video Generation
- Portrait mode (9:16) does not support reference_images
- 1080p resolution limited to 8s max
- Maximum duration: 64 seconds (with extension)
Video Editing
- O1 requires input_video (3-10s, max 32MB)
- 2.6 Pro requires reference_video (3-30s, max 100MB)
- Prompt must be 2-2500 characters
- Reference images required for 2.6 Pro (1-7 images)
Best Practices
Use Reference Images for Products
Use Reference Images for Products
When generating videos with products, use reference_images with reference_type=“asset” to ensure the product appears correctly.
Keep Scene Descriptions Generic
Keep Scene Descriptions Generic
Don’t hardcode specific values from previous blocks. Use generic descriptions and reference images for specific objects.
Choose Appropriate Duration
Choose Appropriate Duration
Start with shorter videos (4-8s) for faster generation and lower costs. Extend if needed.
Use Batch Operations
Use Batch Operations
For multiple videos, consider generating them in parallel workflows to save time.
Use Cases
Marketing Videos
Create product demos, promotional videos, and social media content.
Content Creation
Generate video content for blogs, tutorials, and presentations.
Video Editing
Face swap, style transfer, and motion replication for existing videos.
Character Consistency
Generate videos with consistent characters using reference images.
Related Features
- Image Generation - Generate reference images for videos
- Vision Analysis - Analyze generated videos
- Plans and Credits - Credit costs and plan requirements
Video generation is a powerful but credit-intensive feature. Plan your video workflows carefully to manage credit consumption effectively.