Documentation
Uploading Videos
Upload, analyze, and search videos scene-by-scene with AI-powered vision models
One Resource for Every Media Type
Videos share the unified /api/v2/files/* surface with images and documents — same upload endpoints, same listing endpoint, same deletion. The SDK auto-selects S3 multipart for video-sized files; the server detects the media type from content and routes the upload through thumbnail generation, metadata extraction, and scene-level VLM analysis automatically.
Scene-Aware Video Intelligence
Each uploaded video flows through a four-stage pipeline: ingestion (S3 multipart upload) → thumbnails + ffprobe metadata (duration, codec, frame rate, bitrate) → VLM analysis (per-scene descriptions and tags via Gemini) → embeddings (vectors per scene for semantic search). This enables natural-language scene search and chat grounded in video content.
Supported Formats
.mp4, .mov, .m4v, .webm, and .avi. Codecs: H.264, H.265 / HEVC, VP8, VP9, AV1.
Limits: up to 2 GB per file, and up to 10 minutes per video by default (configurable up to 2 hours). Larger or longer videos should be segmented before upload.
Quick Start
Upload a Video
The SDK's unified upload path auto-selects S3 multipart for video-sized files and uploads parts in parallel. You get back the image_id as soon as the file is registered; ffprobe metadata, thumbnail generation, and VLM scene analysis continue asynchronously.
async with Scopix(api_key="scopix_...") as client: result = await client.files.upload("clip.mp4")
print(f"File ID: {result.image_id}") print(f"Filename: {result.filename}")Check Processing & Analysis Status
Thumbnail + ffprobe metadata generally land within seconds. VLM scene analysis can take minutes depending on video length and queue depth. Fetch the file to inspect both.
file = await client.files.get(result.image_id)
# Populated by the variant worker (usually within seconds)print(file.video_metadata) # dict: duration_seconds, width, height, frame_rate, video_codec, ...
# Populated by video_analysis_worker (minutes)print(file.video_analysis_status) # "pending" | "processing" | "completed" | "failed"print(file.scene_count) # int, once analysis completes
if file.video_analysis_job_id: job = await client.video_analysis.wait_for_completion( file.video_analysis_job_id, timeout=600, ) print(f"Analysis finished: {job.status}")Scene Search
Once analysis completes, each scene is indexed independently with its own description, tag set, and vector embedding. Scene search returns matching clips with start/end timestamps so you can deep-link into a specific moment.
# Natural-language scene search across all uploaded videosresult = await client.agent_search.videos( "closeup of a fire truck at night", limit=10,)
for video in result.results: print(f"{video.video_filename} score={video.score:.3f}") for scene in video.matched_scenes or []: print(f" scene {scene.scene_index} " f"{scene.time_range_formatted} score={scene.score:.3f}") print(f" {scene.description}")
