Documentation

Uploading Videos

Upload, analyze, and search videos scene-by-scene with AI-powered vision models

One Resource for Every Media Type

Videos share the unified /api/v2/files/* surface with images and documents — same upload endpoints, same listing endpoint, same deletion. The SDK auto-selects S3 multipart for video-sized files; the server detects the media type from content and routes the upload through thumbnail generation, metadata extraction, and scene-level VLM analysis automatically.

Scene-Aware Video Intelligence

Each uploaded video flows through a four-stage pipeline: ingestion (S3 multipart upload) → thumbnails + ffprobe metadata (duration, codec, frame rate, bitrate) → VLM analysis (per-scene descriptions and tags via Gemini) → embeddings (vectors per scene for semantic search). This enables natural-language scene search and chat grounded in video content.

Supported Formats

.mp4, .mov, .m4v, .webm, and .avi. Codecs: H.264, H.265 / HEVC, VP8, VP9, AV1.

Limits: up to 2 GB per file, and up to 10 minutes per video by default (configurable up to 2 hours). Larger or longer videos should be segmented before upload.

Quick Start

Upload a Video

The SDK's unified upload path auto-selects S3 multipart for video-sized files and uploads parts in parallel. You get back the image_id as soon as the file is registered; ffprobe metadata, thumbnail generation, and VLM scene analysis continue asynchronously.

python
async with Scopix(api_key="scopix_...") as client:
result = await client.files.upload("clip.mp4")
print(f"File ID: {result.image_id}")
print(f"Filename: {result.filename}")

Check Processing & Analysis Status

Thumbnail + ffprobe metadata generally land within seconds. VLM scene analysis can take minutes depending on video length and queue depth. Fetch the file to inspect both.

python
file = await client.files.get(result.image_id)
# Populated by the variant worker (usually within seconds)
print(file.video_metadata) # dict: duration_seconds, width, height, frame_rate, video_codec, ...
# Populated by video_analysis_worker (minutes)
print(file.video_analysis_status) # "pending" | "processing" | "completed" | "failed"
print(file.scene_count) # int, once analysis completes
if file.video_analysis_job_id:
job = await client.video_analysis.wait_for_completion(
file.video_analysis_job_id,
timeout=600,
)
print(f"Analysis finished: {job.status}")

Scene Search

Once analysis completes, each scene is indexed independently with its own description, tag set, and vector embedding. Scene search returns matching clips with start/end timestamps so you can deep-link into a specific moment.

python
# Natural-language scene search across all uploaded videos
result = await client.agent_search.videos(
"closeup of a fire truck at night",
limit=10,
)
for video in result.results:
print(f"{video.video_filename} score={video.score:.3f}")
for scene in video.matched_scenes or []:
print(f" scene {scene.scene_index} "
f"{scene.time_range_formatted} score={scene.score:.3f}")
print(f" {scene.description}")