Creating Media with vofy-cli
This skill guides you through the complete media creation workflow with vofy-cli, the CLI for Vofy — from understanding the user's need to delivering a downloaded result.
If vofy is missing, stop and tell the user to install it with npm install -g vofy-cli@0.1.3 before continuing.
Workflow
- Determine what the user wants (image or video? what content?)
- Pick the right mode (text_to_image, image_to_video, etc.)
- Pick and confirm the model (
vofy models <name>first; use imagine-models skill if unsure) - Build the command with correct flags
- Execute and deliver the result
Image Creation
Mode Selection
| User wants... | Mode | Key flags |
|---------------|------|-----------|
| Generate from text description | text_to_image | --prompt |
| Edit/transform an existing image | image_to_image | --prompt --image <path> |
| Edit a specific region of an image | inpainting | --prompt --image <path> --mask <path> |
Template
vofy image create \
--model <model> \
--prompt "<prompt>" \
--aspect-ratio <ratio> \
--resolution <resolution> \
--quality <value> \
--yes \
--download-to ./output
--qualityis model-specific. Only include it when the model exposes a quality parameter (checkvofy models <name>).
Common Patterns
Text to image (simplest)
vofy image create --model seedream-4.5 --prompt "a cat sitting on a windowsill" --yes --download-to ./output
Image to image (edit/transform)
vofy image create --model gpt-image-1.5 --prompt "make it look like a watercolor painting" --image ./photo.jpg --yes --download-to ./output
Transparent background (logos, icons)
vofy image create --model gpt-image-1.5 --prompt "a minimalist logo of a mountain" --background transparent --yes --download-to ./output
High resolution
vofy image create --model seedream-4.5 --prompt "landscape photo" --resolution 4K --aspect-ratio 16:9 --yes --download-to ./output
Video Creation
Mode Selection
| User wants... | Mode | Key flags |
|---------------|------|-----------|
| Generate video from text | text_to_video | --prompt |
| Animate a still image | image_to_video | --prompt --first-frame <path> |
| Generate video between two frames | interpolation | --first-frame <path> --last-frame <path> |
| Use reference images for style | reference_images | --prompt --reference-image <path> |
| Use mixed media references | multimodal_reference | --mode multimodal_reference --reference-image <path> --reference-video <path> |
| Control motion trajectories | motion_control | Model-specific |
| Transform existing video | video_to_video | --mode video_to_video --prompt --video <path> |
| Extend existing video | video_extension | --mode video_extension --prompt --video <path> |
Template
vofy video create \
--model <model> \
--prompt "<prompt>" \
--duration <seconds> \
--aspect-ratio <ratio> \
--yes \
--download-to ./output
Common Patterns
Text to video (simplest)
vofy video create --model veo-3.1 --prompt "a drone shot flying over a forest at sunrise" --duration 6 --yes --download-to ./output
Animate a still image
vofy video create --model kling-3.0 --prompt "the character slowly turns their head" --first-frame ./character.png --yes --download-to ./output
Interpolation (morph between two frames)
vofy video create --model seedance-2.0 --prompt "smooth transition" --first-frame ./start.png --last-frame ./end.png --duration 4 --yes --download-to ./output
Video with audio
vofy video create --model kling-3.0 --prompt "a person speaking at a podium" --audio --duration 8 --yes --download-to ./output
Key Rules for AI Agents
-
Always use
--yes— skips the interactive route picker. Without it, the CLI will prompt for user input which blocks the agent. -
Use
--download-to <path>when you need local files — otherwise the default sync output already includes generated resource URLs. -
Local files auto-upload — pass local paths directly to
--image,--first-frame,--video, etc. The CLI handles upload automatically. -
Default is synchronous — the command waits for the task to complete. Use
--asyncif you want to submit and check later. -
Check credits first — run
vofy statusbefore creating media to verify the user has sufficient credits. -
Never run
vofy login— it requires a browser. If auth fails, tell the user to run it manually.
Handling Results
After a successful create command:
- With
--download-to ./output: files are saved to the specified directory - Without a download flag: the sync output includes generated resource URLs by default
- With
--result-url: URLs are printed explicitly after completion - For async tasks: use
vofy task <id_or_prefix> --download-to ./outputto download later
Validation Gotchas
These are the most common causes of failed generations. The model docs have full validation rules, but these are the ones that trip up agents most often:
kling-2.6:
- 720p does NOT support
--audio— must use 1080p for audio image_to_videoignores--aspect-ratio(follows the input image)--last-framerequiresresolution=1080pand forbids--audiomotion_controlforbids--last-frame,--duration,--aspect-ratio
kling-3.0:
image_to_videoignores--aspect-ratio(follows the input image)--multi-shotrequires--shot-type(eithercustomizeorintelligence)--shot-type customizerequires--multi-promptand forbids--prompt--shot-type intelligencerequires--promptand forbids--multi-prompt
seedance-2.0 / seedance-2.0-fast:
multimodal_referenceuses--reference-image/--reference-video/--reference-audio— NOT--first-frameor--video- Mixed image+video/audio references require
--mode multimodal_reference --reference-videoand--reference-audioare ONLY available inmultimodal_referencemode
grok-imagine-video:
video_to_videoandvideo_extensionignore--aspect-ratioand--resolution(follows source video)
Ambiguous mode inference:
--videoby itself is ambiguous; use--mode video_to_videoor--mode video_extension- Mixed
--reference-imagewith--reference-video/--reference-audioneeds--mode multimodal_reference
General:
- Some models have
hidden_controls— parameters that exist but are auto-managed (e.g.,kling-3.0-motion-controlhides duration, aspect_ratio, audio) - Some parameters have
derivedvalues in certain modes (auto-set from input, e.g., aspect_ratio from source video) - Route pricing multipliers affect cost — route_b/route_c are often 0.5x but may restrict available modes/resolutions
Error Handling
| Error | Action |
|-------|--------|
| "Not authenticated" | Tell user to run vofy login |
| "Insufficient credits" | Tell user to check vofy billing |
| "Model not found" | Run vofy models to list available models |
| "Invalid parameter" | Check model capabilities with vofy models <name> |
| Task fails/times out | Check vofy task <id_or_prefix> for error details |
Detailed Examples
See examples.md for more real-world scenarios.