Stability AI · Schema

ImageToVideoRequest

3D GenerationAIGenerative AIImage GenerationImage EditingMachine LearningStable DiffusionText to ImageVideo Generation

Properties

Name Type Description
image string The source image to animate. The image should contain a clear subject with a stable composition suitable for animation. Supported resolutions include 1024x576, 576x1024, and 768x768.
seed integer A value to control randomness in the video generation.
cfg_scale number Controls how closely the video follows the conditioning from the input image. Higher values produce videos more faithful to the input but may reduce motion quality.
motion_bucket_id integer Controls the amount of motion in the generated video. Lower values produce less motion, higher values produce more dynamic motion.
View JSON Schema on GitHub

JSON Schema

stability-ai-imagetovideorequest-schema.json Raw ↑
{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "#/components/schemas/ImageToVideoRequest",
  "title": "ImageToVideoRequest",
  "type": "object",
  "required": [
    "image"
  ],
  "properties": {
    "image": {
      "type": "string",
      "format": "binary",
      "description": "The source image to animate. The image should contain a clear subject with a stable composition suitable for animation. Supported resolutions include 1024x576, 576x1024, and 768x768."
    },
    "seed": {
      "type": "integer",
      "minimum": 0,
      "maximum": 4294967294,
      "default": 0,
      "description": "A value to control randomness in the video generation."
    },
    "cfg_scale": {
      "type": "number",
      "minimum": 0,
      "maximum": 10,
      "default": 1.8,
      "description": "Controls how closely the video follows the conditioning from the input image. Higher values produce videos more faithful to the input but may reduce motion quality."
    },
    "motion_bucket_id": {
      "type": "integer",
      "minimum": 1,
      "maximum": 255,
      "default": 127,
      "description": "Controls the amount of motion in the generated video. Lower values produce less motion, higher values produce more dynamic motion."
    }
  }
}