Documentation

Files API Reference

Unified file resource for images, documents, and videos — uploads, retrieval, search, digitization, and management

One Resource for Every Media Type

All file operations live under /api/v2/files/*. Images, documents, videos, and links share the same CRUD endpoints; media-specific sub-paths (variants, text, chunks, digitization, similar) return 400 if used on the wrong media type.

Automatic File Type Detection

The API auto-detects file types from content using magic byte signatures. You don't need to set the correct Content-Type header in multipart form data — if omitted or mismatched, the server inspects the payload and routes the file to the right pipeline. Unrecognizable or unsafe files (executables, scripts) are rejected.

Streaming Upload

Single-request multipart upload for files up to 100 MB. The recommended path for almost every upload — no init/complete dance, no client-side hashing.

POST/api/v2/files/upload

Upload a single file (multipart/form-data). Auto-routes by detected media type. Returns 201 Created.

Request

json
// multipart/form-data fields:
// Required:
// file: (binary) — file to upload (up to 100 MB)
// Optional:
// title: (string, max 255) — file title
// tags: (string) — comma-separated tags
// auto_describe: (boolean, default true) — run AI description pipeline
// skip_duplicates: (boolean, default false) — skip if hash already exists
// storage_target: (string, default "default") — "default" or "custom"
// folder_id: (string) — destination folder UUID
// project_id: (string) — project workspace UUID (used when no folder_id)
// content_category: (string, default "general") — content category for tailored AI
// Valid values: general, blueprint, ce_plan, technical_diagram,
// architectural_design, product_photo, real_estate, mining, robotics,
// artwork, screenshot, document, map, pid, pfd, construction,
// facility_assessment
// custom_schema_id: (string) — optional saved custom extraction schema UUID;
// triggers a second VLM pass with that schema
// compliance_type: (string) — "mls" or "marketplace"
// compliance_standard: (string) — required if compliance_type is set
// (e.g. "nar_baseline", "amazon")
// compliance_image_type: (string, default "main") — "main" or "secondary"
curl -X POST https://api.scopix.ai/api/v2/files/upload \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@photo.jpg" \
-F "title=Site Inspection" \
-F "tags=inspection,site"

Response

json
// 201 Created
{
"image_id": "550e8400-e29b-41d4-a716-446655440000",
"upload_method": "STREAMING",
"status": "completed", // "completed" | "processing" | "skipped"
"processing_time_ms": 1250.5,
"upload_completed": true,
"thumbnail_generation_started": true,
"analysis_started": true,
"skipped": false,
"skipped_existing_image_id": null,
"storage_target": "default",
"media_type": "image", // "image" | "document" | "video"
"document_type": null, // "pdf" | "docx" | "txt" | "md" (documents only)
"text_extraction_status": null // "pending" | "processing" | "completed" | "failed" (documents)
}
// 429 Too Many Requests — backpressure (Retry-After header set)
// 413 Payload Too Large — file exceeds streaming limit (use /files/uploads multipart)
POST/api/v2/files/upload/batch

Multi-file batch upload. Per-tier file count: FREE 10, STARTER 50, PROFESSIONAL 100, ENTERPRISE 200. Each file is capped at 100 MB. Returns 201 Created.

Request

json
// multipart/form-data fields:
// Required:
// files: (binary[]) — multiple files (each up to 100 MB)
// Optional:
// tags: (string) — comma-separated tags applied to all files
// auto_describe: (boolean, default true) — run AI description pipeline
// skip_duplicates: (boolean, default false)
// storage_target: (string, default "default")
// folder_id: (string) — destination folder UUID
// project_id: (string) — project workspace UUID
// content_category: (string, default "general")
// custom_schema_id: (string) — optional saved custom extraction schema UUID
// applied to every file in the batch
// compliance_type: (string) — "mls" or "marketplace"
// compliance_standard: (string) — required if compliance_type is set
// compliance_image_type: (string, default "main") — "main" or "secondary"
curl -X POST https://api.scopix.ai/api/v2/files/upload/batch \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "files=@photo1.jpg" \
-F "files=@photo2.jpg" \
-F "files=@report.pdf"

Response

json
// 201 Created
{
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"total_files": 3,
"accepted_files": 3,
"rejected_files": 0,
"status": "completed", // "completed" | "partial" | "processing" | "rejected"
"immediate_results": [
{
"image_id": "660f9500-e29b-41d4-a716-446655440000",
"filename": "photo1.jpg",
"status": "completed", // "completed" | "failed" | "skipped"
"processing_time_ms": 850.2,
"skipped": false,
"skipped_existing_image_id": null,
"error": null,
"storage_target": "default",
"media_type": "image",
"document_type": null,
"text_extraction_status": null
}
],
"status_url": "/api/v2/files/sessions/{session_id}/status",
"websocket_channel": "batch.{session_id}",
"rejections": null
}
// For larger batches, poll status_url or subscribe to websocket_channel

Presigned & Multipart Upload

For files larger than 100 MB or when you want the bytes to bypass the API entirely, use the upload-intent flow: request → PUT directly to S3 → complete. Use upload_mode: "single_shot" for files up to 5 GB; "multipart" for anything larger (videos, large datasets).

POST/api/v2/files/uploads

Create an upload intent. Returns a presigned PUT URL (single-shot) or per-part presigned URLs (multipart). The client must compute SHA-256 of the file and pin it as claimed_file_hash; the server verifies on /complete.

Request

json
{
"filename": "inspection.mp4",
"content_type": "video/mp4",
"size_bytes": 524288000,
"claimed_file_hash": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"upload_mode": "multipart", // optional — omit to let server pick by size.
// "single_shot" (<=5 GB) | "multipart" (>=5 MB)
"part_size_bytes": 8388608, // multipart only — min 5 MB per part
"idempotency_key": null, // optional, max 128 chars
"title": "Site Inspection", // optional
"tags": ["inspection", "site-a"], // optional, max 20 tags (1-50 chars each)
"folder_id": null, // optional folder UUID
"project_id": null, // optional project UUID
"skip_duplicates": false, // optional
"storage_target": "default", // optional (not currently honored server-side)
"auto_describe": true, // optional, default true
"content_category": "general", // optional
"custom_schema_id": null, // optional saved schema UUID
"compliance_type": null, // optional: "mls" | "marketplace"
"compliance_standard": null, // required if compliance_type is set
"compliance_image_type": "main" // optional: "main" | "secondary"
}
// Required: filename, content_type, size_bytes, claimed_file_hash
// upload_mode is OPTIONAL — the server auto-selects by size_bytes
// claimed_file_hash: 64-char SHA-256 hex (server verifies post-upload)

Response

json
// Single-shot response:
{
"upload_id": "550e8400-e29b-41d4-a716-446655440000",
"upload_mode": "single_shot",
"media_type": "video",
"method": "PUT",
"presigned_url": "https://s3.amazonaws.com/...",
"headers": {
"Content-Type": "video/mp4",
"x-amz-checksum-sha256": "<base64(sha256)>",
"x-amz-sdk-checksum-algorithm": "SHA256"
},
"object_key": "videos/<tenant>/<hash>.mp4",
"expires_at": "2026-04-15T10:40:00Z",
"max_size_bytes": 524288000,
"bucket_name": "scopix-uploads"
}
// Multipart response:
{
"upload_id": "550e8400-e29b-41d4-a716-446655440000",
"upload_mode": "multipart",
"media_type": "video",
"s3_upload_id": "abc...XYZ",
"object_key": "videos/<tenant>/<hash>.mp4",
"part_urls": [
{"part_number": 1, "url": "https://s3.amazonaws.com/...", "expires_at": "2026-04-15T10:40:00Z"},
{"part_number": 2, "url": "https://s3.amazonaws.com/...", "expires_at": "2026-04-15T10:40:00Z"}
],
"part_size_bytes": 8388608,
"total_parts": 63,
"expires_at": "2026-04-15T10:40:00Z",
"bucket_name": "scopix-uploads"
}
GET/api/v2/files/uploads/{upload_id}

Get the current state of an upload intent (PENDING, UPLOADED, COMPLETED, FAILED) and per-part progress for multipart.

Response

json
{
"upload_id": "550e8400-e29b-41d4-a716-446655440000",
"upload_mode": "multipart",
"status": "UPLOADED", // PENDING | UPLOADED | COMPLETED | FAILED
"media_type": "video",
"object_key": "videos/<tenant>/<hash>.mp4",
"filename": "inspection.mp4",
"size_bytes": 524288000,
"total_parts": 63, // null for single_shot
"parts_confirmed": 63, // null for single_shot
"progress_percent": 100.0, // null for single_shot
"created_at": "2026-04-15T10:30:00Z",
"expires_at": "2026-04-15T10:40:00Z",
"confirmed_at": "2026-04-15T10:38:00Z",
"error_message": null
}
POST/api/v2/files/uploads/{upload_id}/parts/confirm

Confirm a successfully uploaded multipart chunk. Call after each PUT to S3 with the returned ETag.

Request

json
{
"part_number": 1,
"etag": "\"abc123def456\"",
"size_bytes": 8388608
}
// part_number: 1-indexed
// etag: from S3 PUT response (quoted form is fine)

Response

json
{
"upload_id": "550e8400-e29b-41d4-a716-446655440000",
"part_number": 1,
"parts_confirmed": 1,
"total_parts": 63,
"progress_percent": 1.59
}
POST/api/v2/files/uploads/{upload_id}/parts/retry

Get a fresh presigned URL for re-uploading a failed multipart chunk.

Request

json
{
"part_number": 5
}

Response

json
{
"upload_id": "550e8400-e29b-41d4-a716-446655440000",
"part_number": 5,
"url": "https://s3.amazonaws.com/...",
"expires_at": "2026-04-15T10:50:00Z"
}
POST/api/v2/files/uploads/{upload_id}/complete

Finalize an upload (single-shot or multipart). Server completes the S3 multipart, verifies the SHA-256 against claimed_file_hash, creates the file record, and queues media-specific processing (variants/description for images, extraction for documents, ffprobe + analysis for videos). Empty body — server is fully authoritative.

Request

json
{}
// Body must be empty by design. The server uses claimed_file_hash from the
// initiate request and the parts list it tracked from /parts/confirm calls.
// No client-supplied duration/analysis params — videos use server-side
// ffprobe and a 2-credit reservation that the worker reconciles.

Response

json
// 200 OK
{
"upload_id": "550e8400-e29b-41d4-a716-446655440000",
"file_id": "660f9500-e29b-41d4-a716-446655440000",
"media_type": "video", // "image" | "document" | "video"
"filename": "inspection.mp4",
"object_key": "videos/<tenant>/<hash>.mp4",
"size_bytes": 524288000,
"deduplicated": false, // true if an existing file had the same hash
"status": "processing" // "processing" | "completed"
}
// 409 Conflict — claimed_file_hash mismatch (SHA-256 didn't match S3 object)
// 422 Unprocessable Entity — required parts missing on multipart complete
DELETE/api/v2/files/uploads/{upload_id}

Abort an upload intent. For multipart, also aborts the underlying S3 multipart upload (refunds reserved credits if applicable).

Request

json
// Optional query parameter:
?reason=User%20cancelled // up to 255 chars

Response

json
{
"upload_id": "550e8400-e29b-41d4-a716-446655440000",
"aborted": true,
"reason": "User cancelled"
}

File Listing & Retrieval

GET/api/v2/files

List files with full-text search and filters. Heterogeneous results across media types — use media_types query param to scope.

Request

json
// Query parameters:
?search=damage report // optional, full-text search
&search_mode=all // optional, default: all, options: all | metadata | visible_text
&tags=safety&tags=inspection // optional, multi-value filter by tags
&media_types=image&media_types=document // optional, multi-value: image | document | video | link
&folder_id=folder_abc123 // optional, filter by folder
&project_id=uuid // optional, filter by project workspace
&has_description=true // optional, filter by description status
&ids=uuid1&ids=uuid2 // optional, multi-value filter by file IDs
&compliance_status=passed // optional, filter by compliance status
&date_from=2026-01-01T00:00:00Z // optional
&date_to=2026-01-31T23:59:59Z // optional
&sort_by=content_created_at // optional, options: created_at | content_created_at | title | size_bytes
&sort_order=desc // optional, default: desc
&limit=20 // optional, default: 20, 1-100
&offset=0 // optional, default: 0

Response

json
{
"items": [
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"title": "Site Photo A",
"filename": "site_photo.jpg",
"thumbnail_url": "https://cdn.scopix.ai/thumbs/...",
"upload_description": "Damaged concrete pillar with visible cracks...",
"visible_text": "WARNING: STRUCTURAL DAMAGE",
"tags": ["damage", "concrete"],
"size_bytes": 2048576,
"created_at": "2026-01-15T10:30:00Z",
"content_created_at": "2026-01-14T08:00:00Z",
"has_full_description": true,
"dimensions": {"width": 4000, "height": 3000},
"format": "jpeg",
"variant_status": "completed",
"variant_count": 5,
"medium_url": "https://cdn.scopix.ai/medium/...",
"full_url": "https://cdn.scopix.ai/large/...",
"blur_hash": "L6PZfSi_.AyE_3t7t7R**0o#DgR4",
"description_status": "completed",
"description_error": null,
"content_type": "image/jpeg",
"media_type": "image",
"content_category": "general",
"document_type": null,
"source_url": null
}
],
"total_count": 150,
"limit": 20,
"offset": 0,
"has_more": true
}
// Conditional fields by media_type:
// document: document_type, page_count, text_extraction_status, chunk_count, document_url
// video: duration_seconds, frame_rate, video_codec, resolution, analysis_status
// link: source_url, domain, og_metadata, favicon_url, crawl_status,
// extracted_images, extracted_images_count
GET/api/v2/files/{file_id}

Get detailed file information. Discriminated by media_type — variant-specific fields appear only on the matching variant. Accepts full UUID or 8-character prefix.

Request

json
// Optional query parameter:
?format=markdown // optional — when set to "markdown" on an image, the
// response includes a formatted_document rendering
// of CE plan / legend / schedule / description data

Response

json
// media_type: "image"
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"media_type": "image",
"title": "Site Photo A",
"tags": ["damage", "concrete"],
"size_bytes": 2048576,
"content_type": "image/jpeg",
"dimensions": {"width": 4000, "height": 3000},
"format": "jpeg",
"full_url": "https://cdn.scopix.ai/large/...",
"thumbnail_url": "https://cdn.scopix.ai/thumbs/...",
"medium_url": "https://cdn.scopix.ai/medium/...",
"original_url": "https://cdn.scopix.ai/originals/...",
"variant_status": "completed",
"variant_count": 5,
"upload_description": "Damaged concrete pillar...",
"visible_text": "WARNING: STRUCTURAL DAMAGE",
"text_regions": [
{"text": "WARNING: STRUCTURAL DAMAGE",
"bounding_box": {"x_min": 0.25, "y_min": 0.4, "x_max": 0.75, "y_max": 0.52}}
],
"description_generated_at": "2026-01-15T10:32:00Z",
"full_descriptions": [...],
"created_at": "2026-01-15T10:30:00Z",
"updated_at": "2026-01-15T10:35:00Z",
"blur_hash": "L6PZfSi_.AyE_3t7t7R**0o#DgR4",
"description_status": "completed",
"content_category": "general"
}
// media_type: "document"
{
"id": "...", "media_type": "document",
"filename": "safety_manual.pdf",
"document_type": "pdf",
"page_count": 45,
"chunk_count": 128,
"text_extraction_status": "completed",
"extracted_text": "SAFETY MANUAL\n\nChapter 1...",
...
}
// media_type: "video"
{
"id": "...", "media_type": "video",
"filename": "inspection.mp4",
"duration_seconds": 240.5,
"frame_rate": 30.0,
"video_codec": "h264",
"resolution": "1920x1080",
"analysis_status": "completed",
"thumbnail_url": "https://...",
...
}
GET/api/v2/files/{file_id}/download

Download original file. Returns 302 redirect to a temporary download URL with Content-Disposition header.

Response

json
// Returns 302 Redirect to presigned download URL
// URL expires in 5 minutes (300 seconds)
// Content-Disposition header set for download

File Updates & Deletion

PATCH/api/v2/files/{file_id}

Update file metadata (title, tags, user_description). Pass only the fields you want to change.

Request

json
{
"title": "Updated Photo Title",
"tags": ["updated", "reviewed"],
"user_description": "Quarterly inspection — minor surface cracks only"
}
// title: optional, max 255 characters
// tags: optional, max 40 tags, each max 50 characters
// user_description: optional, max 10000 chars; pass null to reset to AI-generated description

Response

json
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"title": "Updated Photo Title",
"tags": ["updated", "reviewed"],
"user_description": "Quarterly inspection — minor surface cracks only",
"upload_description": "A concrete pillar with visible damage...",
"updated_at": "2026-01-15T11:00:00Z"
}
DELETE/api/v2/files/{file_id}

Soft-delete a file. Recoverable within 30 days.

Response

json
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"deleted_at": "2026-01-15T11:00:00Z",
"message": "File deleted successfully"
}
// 409 Conflict — cannot delete while document text extraction or
// embedding is in progress
POST/api/v2/files/batch-delete

Delete up to 100 files in a single request. Each file is reported individually so partial failures don't block the batch.

Request

json
{
"file_ids": [
"550e8400-e29b-41d4-a716-446655440000",
"660f9500-f39c-52e5-b827-557766550111"
]
}
// 1-100 unique UUIDs

Response

json
{
"deleted": [
{"id": "550e8400-e29b-41d4-a716-446655440000", "status": "deleted",
"message": null, "deleted_at": "2026-01-15T11:00:00Z"}
],
"skipped": [],
"failed": [
{"id": "660f9500-f39c-52e5-b827-557766550111", "status": "failed",
"message": "File not found", "deleted_at": null}
],
"summary": {"total": 2, "deleted": 1, "skipped": 0, "failed": 1}
}

Image Operations

Image-only sub-paths. Calling these on a non-image file returns 400.

GET/api/v2/files/{file_id}/variant/{variant_type}

Get a specific image variant. Returns 302 redirect to the variant URL (1-hour expiry).

Request

json
// variant_type options:
// - original: Original uploaded image
// - tiny_64: 64px max dimension
// - small_256: 256px max dimension
// - medium_750: 750px max dimension
// - large_1024: 1024px max dimension

Response

json
// Returns 302 Redirect to variant URL
// 400 Bad Request if file media_type != "image"
POST/api/v2/files/{file_id}/trigger-variants

Manually re-queue variant generation. Useful for recovery if the original variant pipeline failed.

Response

json
{
"success": true,
"message": "Variant generation triggered",
"task_id": "task_550e8400",
"current_status": "processing",
"image_id": "550e8400-e29b-41d4-a716-446655440000"
}
// If already processing:
// {"success": true, "message": "Variant generation already in progress",
// "skipped_duplicate": true, ...}
GET/api/v2/files/{file_id}/similar

Find visually similar images using hybrid embedding + semantic similarity.

Request

json
// Query parameters:
?limit=20 // optional, 1-50, default: 20

Response

json
{
"reference_image_id": "550e8400-e29b-41d4-a716-446655440000",
"items": [
{
"image_id": "660f9500-e29b-41d4-a716-446655440000",
"title": "Similar beam photo",
"description": "Steel beam with surface corrosion...",
"relevance_score": 0.92,
"vector_similarity": 0.88,
"thumbnail_url": "https://cdn.scopix.ai/thumbs/...",
"medium_url": "https://cdn.scopix.ai/medium/...",
"full_url": "https://cdn.scopix.ai/large/...",
"folder_id": "770a0600-e29b-41d4-a716-446655440000",
"created_at": "2026-01-10T08:00:00Z"
}
],
"total_count": 1
}
// 400 Bad Request if file media_type != "image"
PATCH/api/v2/files/{file_id}/extractions/{domain_name}/review

Review AI extraction results — confirm, reject, or edit extracted items for a domain. Corrections layer on top of AI outputs (originals preserved). Multiple calls merge additively.

Request

json
{
"item_reviews": {
"furniture_items.0": "confirmed",
"furniture_items.1": "rejected",
"materials.2": "confirmed"
},
"field_edits": {
"furniture_items.0.name": "Barcelona Chair",
"furniture_items.0.material": "leather"
}
}
// At least one of item_reviews or field_edits is required.
//
// domain_name: one of:
// architectural_design, ce_plan, layout_region, legend,
// mining, real_estate, technical_diagram, pid, pfd,
// text_regions, mls_compliance, schedule
//
// item_reviews: keys are dot-path identifiers (e.g. "items.0"),
// values must be "confirmed" or "rejected"
// field_edits: keys are dot-path field identifiers (e.g. "items.0.name"),
// values are the corrected data

Response

json
{
"image_id": "550e8400-e29b-41d4-a716-446655440000",
"domain_name": "architectural_design",
"corrections": {
"item_reviews": {"furniture_items.0": "confirmed", "furniture_items.1": "rejected"},
"field_edits": {"furniture_items.0.name": "Barcelona Chair"}
},
"updated_at": "2026-04-13T10:30:00Z"
}
// 400 Bad Request if file media_type != "image" or invalid domain
// 404 Not Found if file or extraction does not exist

Document Operations

Document-only sub-paths. Calling these on a non-document file returns 400.

GET/api/v2/files/{file_id}/text

Get the full extracted plain text from a document.

Response

json
{
"document_id": "550e8400-e29b-41d4-a716-446655440000",
"filename": "safety_manual.pdf",
"text": "SAFETY MANUAL\n\nChapter 1: Introduction\n\nThis manual provides...",
"page_count": 45,
"metadata": {"language": "en"}
}
GET/api/v2/files/{file_id}/chunks

Get all chunks (for RAG / search) from a document. Optionally include the embedding vectors.

Request

json
// Query parameters:
?include_embeddings=false // optional, default: false

Response

json
{
"document_id": "550e8400-e29b-41d4-a716-446655440000",
"chunks": [
{
"chunk_id": "chunk_001",
"document_id": "550e8400-e29b-41d4-a716-446655440000",
"document_filename": "safety_manual.pdf",
"chunk_index": 0,
"content": "Safety inspections must be conducted quarterly...",
"page_numbers": [12, 13],
"heading_hierarchy": ["Chapter 3", "Inspections"],
"similarity_score": null,
"metadata": {
"token_count": 256,
"chunk_type": "paragraph",
"embedding_status": "completed"
}
}
],
"total_chunks": 128,
"status_counts": {"completed": 128, "pending": 0, "failed": 0}
}
// status_counts is only included when include_embeddings=true
// similarity_score is null for direct-fetch (only populated in search results)
GET/api/v2/files/{file_id}/digitization

Get the full structural digitization (per-page elements with bounding boxes) for a document.

Response

json
{
"document_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"total_pages": 3,
"completed_pages": 3,
"failed_pages": 0,
"pages": [
{
"page_number": 1,
"status": "completed",
"element_count": 5,
"elements": [
{
"type": "heading",
"content": "Safety Manual",
"bounding_box": {"x": 0.15, "y": 0.05, "w": 0.70, "h": 0.04},
"metadata": {"level": 1}
},
{
"type": "paragraph",
"content": "This manual provides comprehensive safety guidelines...",
"bounding_box": {"x": 0.10, "y": 0.12, "w": 0.80, "h": 0.15}
},
{
"type": "table",
"content": "| Category | Frequency |\n|---|---|\n| Fire | Quarterly |",
"bounding_box": {"x": 0.10, "y": 0.30, "w": 0.80, "h": 0.20}
}
],
"error_message": null
}
]
}
// status: pending | processing | completed | failed
// element types: heading, paragraph, table, key_value, list, figure
// bounding_box coordinates are normalized 0-1 relative to page dimensions
GET/api/v2/files/{file_id}/digitization/pages/{page_number}

Get digitization elements for a single page (1-indexed).

Response

json
{
"page_number": 2,
"status": "completed",
"element_count": 3,
"elements": [
{
"type": "heading",
"content": "Chapter 2: Fire Safety",
"bounding_box": {"x": 0.10, "y": 0.05, "w": 0.60, "h": 0.04},
"metadata": {"level": 2}
}
],
"error_message": null
}
// 404 Not Found if no digitization exists for the requested page
GET/api/v2/files/{file_id}/digitization/status

Lightweight status check for digitization progress (no element data).

Response

json
{
"document_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "processing",
"total_pages": 5,
"page_statuses": {
"1": "completed",
"2": "completed",
"3": "processing",
"4": "pending",
"5": "pending"
}
}
GET/api/v2/files/{file_id}/processing-status

Cross-media processing status (works for image, document, and video). Includes per-component subprocess statuses.

Response

json
{
"document_id": "550e8400-e29b-41d4-a716-446655440000",
"filename": "safety_manual.pdf",
"document_type": "pdf",
"text_extraction_status": "completed",
"page_count": 45,
"chunk_count": 128,
"created_at": "2026-01-15T10:30:00Z",
"processing_started_at": "2026-01-15T10:30:05Z",
"completed_at": "2026-01-15T10:32:00Z",
"error_message": null
}

Semantic Search & Analyze

AI-Powered Search

Search uses semantic similarity to find content by meaning, not just keywords — "damaged equipment" matches "broken machinery" even without exact words.

POST/api/v2/files/search

Semantic search over document chunks. Scope to specific documents via document_ids; omit to search all.

Request

json
{
"query": "safety inspection requirements",
"limit": 20,
"similarity_threshold": 0.3,
"document_ids": ["550e8400-e29b-41d4-a716-446655440000"]
}
// document_ids is optional — omit to search all documents

Response

json
{
"query": "safety inspection requirements",
"results": [
{
"chunk_id": "chunk_001",
"document_id": "550e8400-e29b-41d4-a716-446655440000",
"document_filename": "safety_manual.pdf",
"chunk_index": 5,
"content": "Safety inspections must be conducted quarterly...",
"page_numbers": [12, 13],
"heading_hierarchy": ["Chapter 3", "Inspections", "Schedule"],
"similarity_score": 0.87,
"metadata": {"chunk_type": "paragraph"}
}
],
"total_count": 15,
"search_time_ms": 45
}
POST/api/v2/files/analyze

Upload a document and receive extracted text in a single call. Waits up to timeout seconds (default 60). If processing exceeds the timeout, response has status: processing and a job_id for polling. Max 10 MB — use POST /files/upload (or /files/uploads for >100 MB) for larger files.

Request

json
curl -X POST https://api.scopix.ai/api/v2/files/analyze \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@report.pdf" \
-F "timeout=60"
// Form fields:
// file: required (PDF, DOCX, TXT, MD)
// timeout: optional, 5-120 (default: 60)
// skip_duplicates: optional (default: false)
// folder_id: optional
// project_id: optional

Response

json
// Discriminated union — check status first.
// status: "completed"
{
"document_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"filename": "report.pdf",
"size_bytes": 2048576,
"processing_time_ms": 4500.0,
"document_type": "pdf",
"text_extraction_status": "completed",
"page_count": 15,
"chunk_count": 42,
"extracted_text": "SAFETY MANUAL\n\nChapter 1: Introduction..."
}
// status: "processing" (timeout exceeded — poll GET /job/{job_id})
{
"document_id": "...",
"status": "processing",
"job_id": "...",
"poll_url": "/api/v2/job/...",
"document_type": "pdf",
"text_extraction_status": "pending"
}
// status: "failed" or "skipped" (content-hash duplicate)
POST/api/v2/files/analyze/async

Same input as POST /files/analyze but always returns 202 immediately with a job_id. Use for fire-and-forget or concurrent document processing.

Request

json
curl -X POST https://api.scopix.ai/api/v2/files/analyze/async \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@report.pdf"

Response

json
// 202 Accepted
{
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "processing",
"poll_url": "/api/v2/job/550e8400-e29b-41d4-a716-446655440000"
}

Export, Quota & Deduplication

GET/api/v2/files/export/columns

Get available columns grouped by category for building export requests.

Response

json
{
"groups": {
"basic": [
{"field_key": "id", "display_name": "ID", "group": "basic"},
{"field_key": "filename", "display_name": "Filename", "group": "basic"},
{"field_key": "title", "display_name": "Title", "group": "basic"},
{"field_key": "size_bytes", "display_name": "Size (bytes)", "group": "basic"},
{"field_key": "created_at", "display_name": "Created At", "group": "basic"}
],
"descriptions": [
{"field_key": "upload_description", "display_name": "AI Description", "group": "descriptions"},
{"field_key": "user_description", "display_name": "User Description", "group": "descriptions"},
{"field_key": "tags", "display_name": "Tags", "group": "descriptions"}
]
}
}
POST/api/v2/files/export

Export file metadata as CSV, XLSX, DOCX, or Google Sheets.

Request

json
{
"format": "csv",
"columns": [
{"field_key": "filename"},
{"field_key": "title"},
{"field_key": "upload_description", "display_name": "AI Description"},
{"field_key": "tags"},
{"field_key": "created_at"}
],
"folder_id": "550e8400-e29b-41d4-a716-446655440000",
"include_subfolders": true,
"flatten_tags": true,
"sheet_name": "Files"
}
// format: required — "csv", "xlsx", "docx", or "google_sheets"
// columns: required, at least 1 column
// field_key: required — from the /export/columns registry
// file_ids: optional UUIDs to scope export
// folder_id: optional folder scope
// include_subfolders: optional, default: false
// flatten_tags: optional, default: true
// google_sheets_title: optional (for google_sheets format)
// connection_id: optional UUID — Google Drive connection (required for google_sheets)

Response

json
{
"download_url": "https://storage.example.com/exports/files_2026-04-13.csv",
"spreadsheet_url": null,
"record_count": 42,
"format": "csv"
}
GET/api/v2/files/quota-check

Check upload quota before starting (prevents failed uploads from quota exhaustion).

Request

json
// Query parameters:
?file_count=10 // required

Response

json
{
"can_proceed": true,
"requested": 10,
"available": 990,
"monthly_limit": 1000,
"current_usage": 10,
"prepaid_credits": 0,
"max_batch_size": 50,
"max_concurrent_uploads": 10,
"message": null
}
// monthly_limit: -1 for unlimited tiers
// When quota exceeded, can_proceed=false and message describes the shortfall
POST/api/v2/files/check-duplicates

Check which file hashes already exist for this tenant before uploading.

Request

json
{
"hashes": [
"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"d7a8fbb307d7809469ca9abcb0082e4f8d5651e46d3cdb762d02d0bf37c9e592"
]
}
// hashes: SHA-256 content hashes (1-250 items)

Response

json
{
"duplicates": [
"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
],
"unique": [
"d7a8fbb307d7809469ca9abcb0082e4f8d5651e46d3cdb762d02d0bf37c9e592"
]
}

Upload Sessions & Status

Batch uploads create a session — a per-batch tracking record. Use these endpoints to poll progress, retrieve per-file results, cancel pending work, and look up the unified processing status of any individual file by ID.

GET/api/v2/files/uploads-status/{image_id}

Get the unified upload + processing status for any file ID.

Response

json
{
"image_id": "550e8400-e29b-41d4-a716-446655440000",
"batch_id": "batch_550e8400",
"unified_status": "processing", // uploading | confirming | queued | processing |
// completed | failed | partially_completed
"component_statuses": {
"variant_status": "completed",
"description_status": "processing",
"upload_status": "completed",
"processing_status": "processing"
},
"processing_ids": ["task_001", "task_002"],
"error_message": null,
"last_error_at": null,
"created_at": "2026-01-15T10:30:00Z",
"last_updated_at": "2026-01-15T10:30:45Z",
"completed_at": null,
"retry_count": 0,
"processing_duration_seconds": 45.2,
"is_stuck": false,
"is_terminal": false
}
GET/api/v2/files/sessions

List upload sessions for the authenticated tenant.

Request

json
// Query parameters:
?status=processing // optional, filter by status
&upload_method=streaming // optional, "streaming" or "presigned"
&offset=0 // pagination (default: 0)
&limit=20 // default: 20, 1-100

Response

json
{
"items": [
{
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"upload_method": "streaming",
"total_files": 20,
"completed_files": 18,
"failed_files": 1,
"skipped_files": 1,
"progress_percentage": 100.0,
"created_at": "2026-01-15T10:30:00Z",
"completed_at": "2026-01-15T10:32:00Z"
}
],
"total_count": 15,
"limit": 20,
"offset": 0,
"has_more": false
}
GET/api/v2/files/sessions/{session_id}/status

Get current progress and recent activity for an upload session.

Response

json
{
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "processing", // pending | uploading | processing |
// completed | failed | expired | cancelled
"total_files": 20,
"completed_files": 15,
"failed_files": 1,
"skipped_files": 2,
"pending_files": 2,
"progress_percentage": 90.0,
"created_at": "2026-01-15T10:30:00Z",
"started_at": "2026-01-15T10:30:00Z",
"completed_at": null,
"estimated_completion_time": null,
"recent_completions": [
{
"filename": "photo1.jpg",
"image_id": "660e8400-e29b-41d4-a716-446655440001",
"status": "completed",
"description": "A site inspection showing...",
"processing_time_ms": null
}
],
"recent_errors": [],
"results_url": "/api/v2/files/sessions/{session_id}/results",
"websocket_channel": "batch.{session_id}"
}
GET/api/v2/files/sessions/{session_id}/results

Paginated per-file results from a session.

Request

json
// Query parameters:
?include_failed=true // include failed files (default: true)
&offset=0 // default: 0
&limit=100 // 1-500, default: 100

Response

json
{
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"results": [
{
"image_id": "660e8400-e29b-41d4-a716-446655440001",
"filename": "photo1.jpg",
"status": "completed",
"description": "Safety inspection showing...",
"visible_text": "EXIT sign visible...",
"tags": ["safety", "construction"],
"processing_time_ms": null,
"error_message": null,
"thumbnail_url": "https://...",
"created_at": "2026-01-15T10:30:05Z"
}
],
"total_count": 20,
"offset": 0,
"limit": 100,
"has_more": false,
"summary": {"total_files": 20, "completed": 17, "failed": 1, "skipped": 2}
}
POST/api/v2/files/sessions/{session_id}/cancel

Cancel a pending or in-progress session. Already-processed files keep their results.

Response

json
{
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "cancelled",
"total_files": 20,
"completed_files": 10,
"failed_files": 0,
"skipped_files": 0,
"pending_files": 10,
"progress_percentage": 50.0,
"created_at": "2026-01-15T10:30:00Z",
"started_at": "2026-01-15T10:30:00Z",
"completed_at": "2026-01-15T10:31:00Z",
"estimated_completion_time": null,
"recent_completions": [],
"recent_errors": [],
"results_url": "/api/v2/files/sessions/{session_id}/results",
"websocket_channel": "batch.{session_id}"
}
// 400 Bad Request if session is already completed/cancelled/expired
GET/api/v2/files/sessions/{session_id}/summary

Aggregated per-file status counts for every file in a session (uploading / processing / completed / failed / stuck) plus description status counts. Optimised for dashboards.

Response

json
{
"batch_id": "550e8400-e29b-41d4-a716-446655440000",
"overall_status": "processing",
"completion_percentage": 65.0,
"counts": {
"total": 20,
"uploading": 2,
"confirming": 0,
"queued": 1,
"processing": 4,
"completed": 13,
"failed": 0,
"partially_completed": 0,
"stuck": 0
},
"description_counts": {
"pending": 0,
"processing": 4,
"completed": 13,
"failed": 0,
"skipped": 0
},
"error_summary": {
"count": 0,
"messages": []
},
"created_at": "2026-01-15T10:30:00Z",
"last_activity_at": "2026-01-15T10:33:15Z"
}
GET/api/v2/files/sessions/stuck

List uploads that have not made progress in the threshold window. Useful for client-side recovery flows. Each entry is a full UnifiedImageStatusResponse (same shape as /files/uploads-status/{image_id}).

Request

json
// Query parameters:
?stuck_minutes=30 // default: 30, threshold for "stuck" (min 1)
&limit=100 // default: 100, 1-500

Response

json
{
"stuck_count": 1,
"images": [
{
"image_id": "550e8400-e29b-41d4-a716-446655440000",
"batch_id": "770a0600-e29b-41d4-a716-446655440000",
"unified_status": "uploading",
"component_statuses": {
"variant_status": null,
"description_status": null,
"upload_status": "streaming",
"processing_status": null
},
"processing_ids": [],
"error_message": null,
"last_error_at": null,
"created_at": "2026-01-15T10:00:00Z",
"last_updated_at": "2026-01-15T10:00:15Z",
"completed_at": null,
"retry_count": 0,
"processing_duration_seconds": 1845.0,
"is_stuck": true,
"is_terminal": false
}
]
}

Telemetry

POST/api/v2/files/log-upload-event

Fire-and-forget client-side upload telemetry (e.g., browser-side errors, retries). Unauthenticated. Server logs the event for diagnostics; never blocks the upload.

Request

json
{
"event_type": "upload_retry", // required, max 100 chars
"message": "Chunk 5 failed with NetworkError, retrying", // required, max 2000 chars
"data": { // optional, serialized size <= 10 KB
"upload_id": "550e8400-e29b-41d4-a716-446655440000",
"part_number": 5,
"user_agent": "Mozilla/5.0 ..."
},
"timestamp": "2026-04-15T10:35:00Z", // optional client-side timestamp
"batch_id": "batch_550e8400", // optional, max 100 chars
"file_index": 5, // optional, 0-10000
"file_name": "huge.pdf" // optional, max 500 chars
}

Response

json
{
"status": "logged",
"timestamp": "2026-04-15T10:35:00.123456+00:00"
}

Limits & Constraints

  • Streaming upload max: 100 MB per file
  • Single-shot presigned max: 5 GB (S3 PUT cap)
  • Multipart max: 5 TB (per S3 limits)
  • Multipart part size: 5 MB minimum per part; S3 imposes a 5 GB per-part hard limit
  • Synchronous document analyze: 10 MB (`/files/analyze`); use `/files/upload` for larger
  • Streaming batch size: 10–200 files per request (tier-dependent); each file capped at 100 MB
  • Batch delete: max 100 files per request
  • Search query length: 1–1000 characters
  • Tags per file: max 40 tags, each max 50 characters
  • Title length: max 255 characters
  • User description: max 10000 characters
  • Hash dedup batch: 1–250 hashes per call