Documentation
Files API Reference
Unified file resource for images, documents, and videos — uploads, retrieval, search, digitization, and management
One Resource for Every Media Type
All file operations live under /api/v2/files/*. Images, documents, videos, and links share the same CRUD endpoints; media-specific sub-paths (variants, text, chunks, digitization, similar) return 400 if used on the wrong media type.
Automatic File Type Detection
The API auto-detects file types from content using magic byte signatures. You don't need to set the correct Content-Type header in multipart form data — if omitted or mismatched, the server inspects the payload and routes the file to the right pipeline. Unrecognizable or unsafe files (executables, scripts) are rejected.
Streaming Upload
Single-request multipart upload for files up to 100 MB. The recommended path for almost every upload — no init/complete dance, no client-side hashing.
/api/v2/files/uploadUpload a single file (multipart/form-data). Auto-routes by detected media type. Returns 201 Created.
Request
// multipart/form-data fields:// Required:// file: (binary) — file to upload (up to 100 MB)// Optional:// title: (string, max 255) — file title// tags: (string) — comma-separated tags// auto_describe: (boolean, default true) — run AI description pipeline// skip_duplicates: (boolean, default false) — skip if hash already exists// storage_target: (string, default "default") — "default" or "custom"// folder_id: (string) — destination folder UUID// project_id: (string) — project workspace UUID (used when no folder_id)// content_category: (string, default "general") — content category for tailored AI// Valid values: general, blueprint, ce_plan, technical_diagram,// architectural_design, product_photo, real_estate, mining, robotics,// artwork, screenshot, document, map, pid, pfd, construction,// facility_assessment// custom_schema_id: (string) — optional saved custom extraction schema UUID;// triggers a second VLM pass with that schema// compliance_type: (string) — "mls" or "marketplace"// compliance_standard: (string) — required if compliance_type is set// (e.g. "nar_baseline", "amazon")// compliance_image_type: (string, default "main") — "main" or "secondary"
curl -X POST https://api.scopix.ai/api/v2/files/upload \ -H "Authorization: Bearer YOUR_API_KEY" \ -F "file=@photo.jpg" \ -F "title=Site Inspection" \ -F "tags=inspection,site"Response
// 201 Created{ "image_id": "550e8400-e29b-41d4-a716-446655440000", "upload_method": "STREAMING", "status": "completed", // "completed" | "processing" | "skipped" "processing_time_ms": 1250.5, "upload_completed": true, "thumbnail_generation_started": true, "analysis_started": true, "skipped": false, "skipped_existing_image_id": null, "storage_target": "default", "media_type": "image", // "image" | "document" | "video" "document_type": null, // "pdf" | "docx" | "txt" | "md" (documents only) "text_extraction_status": null // "pending" | "processing" | "completed" | "failed" (documents)}
// 429 Too Many Requests — backpressure (Retry-After header set)// 413 Payload Too Large — file exceeds streaming limit (use /files/uploads multipart)/api/v2/files/upload/batchMulti-file batch upload. Per-tier file count: FREE 10, STARTER 50, PROFESSIONAL 100, ENTERPRISE 200. Each file is capped at 100 MB. Returns 201 Created.
Request
// multipart/form-data fields:// Required:// files: (binary[]) — multiple files (each up to 100 MB)// Optional:// tags: (string) — comma-separated tags applied to all files// auto_describe: (boolean, default true) — run AI description pipeline// skip_duplicates: (boolean, default false)// storage_target: (string, default "default")// folder_id: (string) — destination folder UUID// project_id: (string) — project workspace UUID// content_category: (string, default "general")// custom_schema_id: (string) — optional saved custom extraction schema UUID// applied to every file in the batch// compliance_type: (string) — "mls" or "marketplace"// compliance_standard: (string) — required if compliance_type is set// compliance_image_type: (string, default "main") — "main" or "secondary"
curl -X POST https://api.scopix.ai/api/v2/files/upload/batch \ -H "Authorization: Bearer YOUR_API_KEY" \ -F "files=@photo1.jpg" \ -F "files=@photo2.jpg" \ -F "files=@report.pdf"Response
// 201 Created{ "session_id": "550e8400-e29b-41d4-a716-446655440000", "total_files": 3, "accepted_files": 3, "rejected_files": 0, "status": "completed", // "completed" | "partial" | "processing" | "rejected" "immediate_results": [ { "image_id": "660f9500-e29b-41d4-a716-446655440000", "filename": "photo1.jpg", "status": "completed", // "completed" | "failed" | "skipped" "processing_time_ms": 850.2, "skipped": false, "skipped_existing_image_id": null, "error": null, "storage_target": "default", "media_type": "image", "document_type": null, "text_extraction_status": null } ], "status_url": "/api/v2/files/sessions/{session_id}/status", "websocket_channel": "batch.{session_id}", "rejections": null}
// For larger batches, poll status_url or subscribe to websocket_channelPresigned & Multipart Upload
For files larger than 100 MB or when you want the bytes to bypass the API entirely, use the upload-intent flow: request → PUT directly to S3 → complete. Use upload_mode: "single_shot" for files up to 5 GB; "multipart" for anything larger (videos, large datasets).
/api/v2/files/uploadsCreate an upload intent. Returns a presigned PUT URL (single-shot) or per-part presigned URLs (multipart). The client must compute SHA-256 of the file and pin it as claimed_file_hash; the server verifies on /complete.
Request
{ "filename": "inspection.mp4", "content_type": "video/mp4", "size_bytes": 524288000, "claimed_file_hash": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855", "upload_mode": "multipart", // optional — omit to let server pick by size. // "single_shot" (<=5 GB) | "multipart" (>=5 MB) "part_size_bytes": 8388608, // multipart only — min 5 MB per part "idempotency_key": null, // optional, max 128 chars "title": "Site Inspection", // optional "tags": ["inspection", "site-a"], // optional, max 20 tags (1-50 chars each) "folder_id": null, // optional folder UUID "project_id": null, // optional project UUID "skip_duplicates": false, // optional "storage_target": "default", // optional (not currently honored server-side) "auto_describe": true, // optional, default true "content_category": "general", // optional "custom_schema_id": null, // optional saved schema UUID "compliance_type": null, // optional: "mls" | "marketplace" "compliance_standard": null, // required if compliance_type is set "compliance_image_type": "main" // optional: "main" | "secondary"}
// Required: filename, content_type, size_bytes, claimed_file_hash// upload_mode is OPTIONAL — the server auto-selects by size_bytes// claimed_file_hash: 64-char SHA-256 hex (server verifies post-upload)Response
// Single-shot response:{ "upload_id": "550e8400-e29b-41d4-a716-446655440000", "upload_mode": "single_shot", "media_type": "video", "method": "PUT", "presigned_url": "https://s3.amazonaws.com/...", "headers": { "Content-Type": "video/mp4", "x-amz-checksum-sha256": "<base64(sha256)>", "x-amz-sdk-checksum-algorithm": "SHA256" }, "object_key": "videos/<tenant>/<hash>.mp4", "expires_at": "2026-04-15T10:40:00Z", "max_size_bytes": 524288000, "bucket_name": "scopix-uploads"}
// Multipart response:{ "upload_id": "550e8400-e29b-41d4-a716-446655440000", "upload_mode": "multipart", "media_type": "video", "s3_upload_id": "abc...XYZ", "object_key": "videos/<tenant>/<hash>.mp4", "part_urls": [ {"part_number": 1, "url": "https://s3.amazonaws.com/...", "expires_at": "2026-04-15T10:40:00Z"}, {"part_number": 2, "url": "https://s3.amazonaws.com/...", "expires_at": "2026-04-15T10:40:00Z"} ], "part_size_bytes": 8388608, "total_parts": 63, "expires_at": "2026-04-15T10:40:00Z", "bucket_name": "scopix-uploads"}/api/v2/files/uploads/{upload_id}Get the current state of an upload intent (PENDING, UPLOADED, COMPLETED, FAILED) and per-part progress for multipart.
Response
{ "upload_id": "550e8400-e29b-41d4-a716-446655440000", "upload_mode": "multipart", "status": "UPLOADED", // PENDING | UPLOADED | COMPLETED | FAILED "media_type": "video", "object_key": "videos/<tenant>/<hash>.mp4", "filename": "inspection.mp4", "size_bytes": 524288000, "total_parts": 63, // null for single_shot "parts_confirmed": 63, // null for single_shot "progress_percent": 100.0, // null for single_shot "created_at": "2026-04-15T10:30:00Z", "expires_at": "2026-04-15T10:40:00Z", "confirmed_at": "2026-04-15T10:38:00Z", "error_message": null}/api/v2/files/uploads/{upload_id}/parts/confirmConfirm a successfully uploaded multipart chunk. Call after each PUT to S3 with the returned ETag.
Request
{ "part_number": 1, "etag": "\"abc123def456\"", "size_bytes": 8388608}
// part_number: 1-indexed// etag: from S3 PUT response (quoted form is fine)Response
{ "upload_id": "550e8400-e29b-41d4-a716-446655440000", "part_number": 1, "parts_confirmed": 1, "total_parts": 63, "progress_percent": 1.59}/api/v2/files/uploads/{upload_id}/parts/retryGet a fresh presigned URL for re-uploading a failed multipart chunk.
Request
{ "part_number": 5}Response
{ "upload_id": "550e8400-e29b-41d4-a716-446655440000", "part_number": 5, "url": "https://s3.amazonaws.com/...", "expires_at": "2026-04-15T10:50:00Z"}/api/v2/files/uploads/{upload_id}/completeFinalize an upload (single-shot or multipart). Server completes the S3 multipart, verifies the SHA-256 against claimed_file_hash, creates the file record, and queues media-specific processing (variants/description for images, extraction for documents, ffprobe + analysis for videos). Empty body — server is fully authoritative.
Request
{}
// Body must be empty by design. The server uses claimed_file_hash from the// initiate request and the parts list it tracked from /parts/confirm calls.// No client-supplied duration/analysis params — videos use server-side// ffprobe and a 2-credit reservation that the worker reconciles.Response
// 200 OK{ "upload_id": "550e8400-e29b-41d4-a716-446655440000", "file_id": "660f9500-e29b-41d4-a716-446655440000", "media_type": "video", // "image" | "document" | "video" "filename": "inspection.mp4", "object_key": "videos/<tenant>/<hash>.mp4", "size_bytes": 524288000, "deduplicated": false, // true if an existing file had the same hash "status": "processing" // "processing" | "completed"}
// 409 Conflict — claimed_file_hash mismatch (SHA-256 didn't match S3 object)// 422 Unprocessable Entity — required parts missing on multipart complete/api/v2/files/uploads/{upload_id}Abort an upload intent. For multipart, also aborts the underlying S3 multipart upload (refunds reserved credits if applicable).
Request
// Optional query parameter:?reason=User%20cancelled // up to 255 charsResponse
{ "upload_id": "550e8400-e29b-41d4-a716-446655440000", "aborted": true, "reason": "User cancelled"}File Listing & Retrieval
/api/v2/filesList files with full-text search and filters. Heterogeneous results across media types — use media_types query param to scope.
Request
// Query parameters:?search=damage report // optional, full-text search&search_mode=all // optional, default: all, options: all | metadata | visible_text&tags=safety&tags=inspection // optional, multi-value filter by tags&media_types=image&media_types=document // optional, multi-value: image | document | video | link&folder_id=folder_abc123 // optional, filter by folder&project_id=uuid // optional, filter by project workspace&has_description=true // optional, filter by description status&ids=uuid1&ids=uuid2 // optional, multi-value filter by file IDs&compliance_status=passed // optional, filter by compliance status&date_from=2026-01-01T00:00:00Z // optional&date_to=2026-01-31T23:59:59Z // optional&sort_by=content_created_at // optional, options: created_at | content_created_at | title | size_bytes&sort_order=desc // optional, default: desc&limit=20 // optional, default: 20, 1-100&offset=0 // optional, default: 0Response
{ "items": [ { "id": "550e8400-e29b-41d4-a716-446655440000", "title": "Site Photo A", "filename": "site_photo.jpg", "thumbnail_url": "https://cdn.scopix.ai/thumbs/...", "upload_description": "Damaged concrete pillar with visible cracks...", "visible_text": "WARNING: STRUCTURAL DAMAGE", "tags": ["damage", "concrete"], "size_bytes": 2048576, "created_at": "2026-01-15T10:30:00Z", "content_created_at": "2026-01-14T08:00:00Z", "has_full_description": true, "dimensions": {"width": 4000, "height": 3000}, "format": "jpeg", "variant_status": "completed", "variant_count": 5, "medium_url": "https://cdn.scopix.ai/medium/...", "full_url": "https://cdn.scopix.ai/large/...", "blur_hash": "L6PZfSi_.AyE_3t7t7R**0o#DgR4", "description_status": "completed", "description_error": null, "content_type": "image/jpeg", "media_type": "image", "content_category": "general", "document_type": null, "source_url": null } ], "total_count": 150, "limit": 20, "offset": 0, "has_more": true}
// Conditional fields by media_type:// document: document_type, page_count, text_extraction_status, chunk_count, document_url// video: duration_seconds, frame_rate, video_codec, resolution, analysis_status// link: source_url, domain, og_metadata, favicon_url, crawl_status,// extracted_images, extracted_images_count/api/v2/files/{file_id}Get detailed file information. Discriminated by media_type — variant-specific fields appear only on the matching variant. Accepts full UUID or 8-character prefix.
Request
// Optional query parameter:?format=markdown // optional — when set to "markdown" on an image, the // response includes a formatted_document rendering // of CE plan / legend / schedule / description dataResponse
// media_type: "image"{ "id": "550e8400-e29b-41d4-a716-446655440000", "media_type": "image", "title": "Site Photo A", "tags": ["damage", "concrete"], "size_bytes": 2048576, "content_type": "image/jpeg", "dimensions": {"width": 4000, "height": 3000}, "format": "jpeg", "full_url": "https://cdn.scopix.ai/large/...", "thumbnail_url": "https://cdn.scopix.ai/thumbs/...", "medium_url": "https://cdn.scopix.ai/medium/...", "original_url": "https://cdn.scopix.ai/originals/...", "variant_status": "completed", "variant_count": 5, "upload_description": "Damaged concrete pillar...", "visible_text": "WARNING: STRUCTURAL DAMAGE", "text_regions": [ {"text": "WARNING: STRUCTURAL DAMAGE", "bounding_box": {"x_min": 0.25, "y_min": 0.4, "x_max": 0.75, "y_max": 0.52}} ], "description_generated_at": "2026-01-15T10:32:00Z", "full_descriptions": [...], "created_at": "2026-01-15T10:30:00Z", "updated_at": "2026-01-15T10:35:00Z", "blur_hash": "L6PZfSi_.AyE_3t7t7R**0o#DgR4", "description_status": "completed", "content_category": "general"}
// media_type: "document"{ "id": "...", "media_type": "document", "filename": "safety_manual.pdf", "document_type": "pdf", "page_count": 45, "chunk_count": 128, "text_extraction_status": "completed", "extracted_text": "SAFETY MANUAL\n\nChapter 1...", ...}
// media_type: "video"{ "id": "...", "media_type": "video", "filename": "inspection.mp4", "duration_seconds": 240.5, "frame_rate": 30.0, "video_codec": "h264", "resolution": "1920x1080", "analysis_status": "completed", "thumbnail_url": "https://...", ...}/api/v2/files/{file_id}/downloadDownload original file. Returns 302 redirect to a temporary download URL with Content-Disposition header.
Response
// Returns 302 Redirect to presigned download URL// URL expires in 5 minutes (300 seconds)// Content-Disposition header set for downloadFile Updates & Deletion
/api/v2/files/{file_id}Update file metadata (title, tags, user_description). Pass only the fields you want to change.
Request
{ "title": "Updated Photo Title", "tags": ["updated", "reviewed"], "user_description": "Quarterly inspection — minor surface cracks only"}
// title: optional, max 255 characters// tags: optional, max 40 tags, each max 50 characters// user_description: optional, max 10000 chars; pass null to reset to AI-generated descriptionResponse
{ "id": "550e8400-e29b-41d4-a716-446655440000", "title": "Updated Photo Title", "tags": ["updated", "reviewed"], "user_description": "Quarterly inspection — minor surface cracks only", "upload_description": "A concrete pillar with visible damage...", "updated_at": "2026-01-15T11:00:00Z"}/api/v2/files/{file_id}Soft-delete a file. Recoverable within 30 days.
Response
{ "id": "550e8400-e29b-41d4-a716-446655440000", "deleted_at": "2026-01-15T11:00:00Z", "message": "File deleted successfully"}
// 409 Conflict — cannot delete while document text extraction or// embedding is in progress/api/v2/files/batch-deleteDelete up to 100 files in a single request. Each file is reported individually so partial failures don't block the batch.
Request
{ "file_ids": [ "550e8400-e29b-41d4-a716-446655440000", "660f9500-f39c-52e5-b827-557766550111" ]}
// 1-100 unique UUIDsResponse
{ "deleted": [ {"id": "550e8400-e29b-41d4-a716-446655440000", "status": "deleted", "message": null, "deleted_at": "2026-01-15T11:00:00Z"} ], "skipped": [], "failed": [ {"id": "660f9500-f39c-52e5-b827-557766550111", "status": "failed", "message": "File not found", "deleted_at": null} ], "summary": {"total": 2, "deleted": 1, "skipped": 0, "failed": 1}}Image Operations
Image-only sub-paths. Calling these on a non-image file returns 400.
/api/v2/files/{file_id}/variant/{variant_type}Get a specific image variant. Returns 302 redirect to the variant URL (1-hour expiry).
Request
// variant_type options:// - original: Original uploaded image// - tiny_64: 64px max dimension// - small_256: 256px max dimension// - medium_750: 750px max dimension// - large_1024: 1024px max dimensionResponse
// Returns 302 Redirect to variant URL// 400 Bad Request if file media_type != "image"/api/v2/files/{file_id}/trigger-variantsManually re-queue variant generation. Useful for recovery if the original variant pipeline failed.
Response
{ "success": true, "message": "Variant generation triggered", "task_id": "task_550e8400", "current_status": "processing", "image_id": "550e8400-e29b-41d4-a716-446655440000"}
// If already processing:// {"success": true, "message": "Variant generation already in progress",// "skipped_duplicate": true, ...}/api/v2/files/{file_id}/similarFind visually similar images using hybrid embedding + semantic similarity.
Request
// Query parameters:?limit=20 // optional, 1-50, default: 20Response
{ "reference_image_id": "550e8400-e29b-41d4-a716-446655440000", "items": [ { "image_id": "660f9500-e29b-41d4-a716-446655440000", "title": "Similar beam photo", "description": "Steel beam with surface corrosion...", "relevance_score": 0.92, "vector_similarity": 0.88, "thumbnail_url": "https://cdn.scopix.ai/thumbs/...", "medium_url": "https://cdn.scopix.ai/medium/...", "full_url": "https://cdn.scopix.ai/large/...", "folder_id": "770a0600-e29b-41d4-a716-446655440000", "created_at": "2026-01-10T08:00:00Z" } ], "total_count": 1}
// 400 Bad Request if file media_type != "image"/api/v2/files/{file_id}/extractions/{domain_name}/reviewReview AI extraction results — confirm, reject, or edit extracted items for a domain. Corrections layer on top of AI outputs (originals preserved). Multiple calls merge additively.
Request
{ "item_reviews": { "furniture_items.0": "confirmed", "furniture_items.1": "rejected", "materials.2": "confirmed" }, "field_edits": { "furniture_items.0.name": "Barcelona Chair", "furniture_items.0.material": "leather" }}
// At least one of item_reviews or field_edits is required.//// domain_name: one of:// architectural_design, ce_plan, layout_region, legend,// mining, real_estate, technical_diagram, pid, pfd,// text_regions, mls_compliance, schedule//// item_reviews: keys are dot-path identifiers (e.g. "items.0"),// values must be "confirmed" or "rejected"// field_edits: keys are dot-path field identifiers (e.g. "items.0.name"),// values are the corrected dataResponse
{ "image_id": "550e8400-e29b-41d4-a716-446655440000", "domain_name": "architectural_design", "corrections": { "item_reviews": {"furniture_items.0": "confirmed", "furniture_items.1": "rejected"}, "field_edits": {"furniture_items.0.name": "Barcelona Chair"} }, "updated_at": "2026-04-13T10:30:00Z"}
// 400 Bad Request if file media_type != "image" or invalid domain// 404 Not Found if file or extraction does not existDocument Operations
Document-only sub-paths. Calling these on a non-document file returns 400.
/api/v2/files/{file_id}/textGet the full extracted plain text from a document.
Response
{ "document_id": "550e8400-e29b-41d4-a716-446655440000", "filename": "safety_manual.pdf", "text": "SAFETY MANUAL\n\nChapter 1: Introduction\n\nThis manual provides...", "page_count": 45, "metadata": {"language": "en"}}/api/v2/files/{file_id}/chunksGet all chunks (for RAG / search) from a document. Optionally include the embedding vectors.
Request
// Query parameters:?include_embeddings=false // optional, default: falseResponse
{ "document_id": "550e8400-e29b-41d4-a716-446655440000", "chunks": [ { "chunk_id": "chunk_001", "document_id": "550e8400-e29b-41d4-a716-446655440000", "document_filename": "safety_manual.pdf", "chunk_index": 0, "content": "Safety inspections must be conducted quarterly...", "page_numbers": [12, 13], "heading_hierarchy": ["Chapter 3", "Inspections"], "similarity_score": null, "metadata": { "token_count": 256, "chunk_type": "paragraph", "embedding_status": "completed" } } ], "total_chunks": 128, "status_counts": {"completed": 128, "pending": 0, "failed": 0}}
// status_counts is only included when include_embeddings=true// similarity_score is null for direct-fetch (only populated in search results)/api/v2/files/{file_id}/digitizationGet the full structural digitization (per-page elements with bounding boxes) for a document.
Response
{ "document_id": "550e8400-e29b-41d4-a716-446655440000", "status": "completed", "total_pages": 3, "completed_pages": 3, "failed_pages": 0, "pages": [ { "page_number": 1, "status": "completed", "element_count": 5, "elements": [ { "type": "heading", "content": "Safety Manual", "bounding_box": {"x": 0.15, "y": 0.05, "w": 0.70, "h": 0.04}, "metadata": {"level": 1} }, { "type": "paragraph", "content": "This manual provides comprehensive safety guidelines...", "bounding_box": {"x": 0.10, "y": 0.12, "w": 0.80, "h": 0.15} }, { "type": "table", "content": "| Category | Frequency |\n|---|---|\n| Fire | Quarterly |", "bounding_box": {"x": 0.10, "y": 0.30, "w": 0.80, "h": 0.20} } ], "error_message": null } ]}
// status: pending | processing | completed | failed// element types: heading, paragraph, table, key_value, list, figure// bounding_box coordinates are normalized 0-1 relative to page dimensions/api/v2/files/{file_id}/digitization/pages/{page_number}Get digitization elements for a single page (1-indexed).
Response
{ "page_number": 2, "status": "completed", "element_count": 3, "elements": [ { "type": "heading", "content": "Chapter 2: Fire Safety", "bounding_box": {"x": 0.10, "y": 0.05, "w": 0.60, "h": 0.04}, "metadata": {"level": 2} } ], "error_message": null}
// 404 Not Found if no digitization exists for the requested page/api/v2/files/{file_id}/digitization/statusLightweight status check for digitization progress (no element data).
Response
{ "document_id": "550e8400-e29b-41d4-a716-446655440000", "status": "processing", "total_pages": 5, "page_statuses": { "1": "completed", "2": "completed", "3": "processing", "4": "pending", "5": "pending" }}/api/v2/files/{file_id}/processing-statusCross-media processing status (works for image, document, and video). Includes per-component subprocess statuses.
Response
{ "document_id": "550e8400-e29b-41d4-a716-446655440000", "filename": "safety_manual.pdf", "document_type": "pdf", "text_extraction_status": "completed", "page_count": 45, "chunk_count": 128, "created_at": "2026-01-15T10:30:00Z", "processing_started_at": "2026-01-15T10:30:05Z", "completed_at": "2026-01-15T10:32:00Z", "error_message": null}Semantic Search & Analyze
AI-Powered Search
Search uses semantic similarity to find content by meaning, not just keywords — "damaged equipment" matches "broken machinery" even without exact words.
/api/v2/files/searchSemantic search over document chunks. Scope to specific documents via document_ids; omit to search all.
Request
{ "query": "safety inspection requirements", "limit": 20, "similarity_threshold": 0.3, "document_ids": ["550e8400-e29b-41d4-a716-446655440000"]}
// document_ids is optional — omit to search all documentsResponse
{ "query": "safety inspection requirements", "results": [ { "chunk_id": "chunk_001", "document_id": "550e8400-e29b-41d4-a716-446655440000", "document_filename": "safety_manual.pdf", "chunk_index": 5, "content": "Safety inspections must be conducted quarterly...", "page_numbers": [12, 13], "heading_hierarchy": ["Chapter 3", "Inspections", "Schedule"], "similarity_score": 0.87, "metadata": {"chunk_type": "paragraph"} } ], "total_count": 15, "search_time_ms": 45}/api/v2/files/analyzeUpload a document and receive extracted text in a single call. Waits up to timeout seconds (default 60). If processing exceeds the timeout, response has status: processing and a job_id for polling. Max 10 MB — use POST /files/upload (or /files/uploads for >100 MB) for larger files.
Request
curl -X POST https://api.scopix.ai/api/v2/files/analyze \ -H "Authorization: Bearer YOUR_API_KEY" \ -F "file=@report.pdf" \ -F "timeout=60"
// Form fields:// file: required (PDF, DOCX, TXT, MD)// timeout: optional, 5-120 (default: 60)// skip_duplicates: optional (default: false)// folder_id: optional// project_id: optionalResponse
// Discriminated union — check status first.
// status: "completed"{ "document_id": "550e8400-e29b-41d4-a716-446655440000", "status": "completed", "filename": "report.pdf", "size_bytes": 2048576, "processing_time_ms": 4500.0, "document_type": "pdf", "text_extraction_status": "completed", "page_count": 15, "chunk_count": 42, "extracted_text": "SAFETY MANUAL\n\nChapter 1: Introduction..."}
// status: "processing" (timeout exceeded — poll GET /job/{job_id}){ "document_id": "...", "status": "processing", "job_id": "...", "poll_url": "/api/v2/job/...", "document_type": "pdf", "text_extraction_status": "pending"}
// status: "failed" or "skipped" (content-hash duplicate)/api/v2/files/analyze/asyncSame input as POST /files/analyze but always returns 202 immediately with a job_id. Use for fire-and-forget or concurrent document processing.
Request
curl -X POST https://api.scopix.ai/api/v2/files/analyze/async \ -H "Authorization: Bearer YOUR_API_KEY" \ -F "file=@report.pdf"Response
// 202 Accepted{ "job_id": "550e8400-e29b-41d4-a716-446655440000", "status": "processing", "poll_url": "/api/v2/job/550e8400-e29b-41d4-a716-446655440000"}Export, Quota & Deduplication
/api/v2/files/export/columnsGet available columns grouped by category for building export requests.
Response
{ "groups": { "basic": [ {"field_key": "id", "display_name": "ID", "group": "basic"}, {"field_key": "filename", "display_name": "Filename", "group": "basic"}, {"field_key": "title", "display_name": "Title", "group": "basic"}, {"field_key": "size_bytes", "display_name": "Size (bytes)", "group": "basic"}, {"field_key": "created_at", "display_name": "Created At", "group": "basic"} ], "descriptions": [ {"field_key": "upload_description", "display_name": "AI Description", "group": "descriptions"}, {"field_key": "user_description", "display_name": "User Description", "group": "descriptions"}, {"field_key": "tags", "display_name": "Tags", "group": "descriptions"} ] }}/api/v2/files/exportExport file metadata as CSV, XLSX, DOCX, or Google Sheets.
Request
{ "format": "csv", "columns": [ {"field_key": "filename"}, {"field_key": "title"}, {"field_key": "upload_description", "display_name": "AI Description"}, {"field_key": "tags"}, {"field_key": "created_at"} ], "folder_id": "550e8400-e29b-41d4-a716-446655440000", "include_subfolders": true, "flatten_tags": true, "sheet_name": "Files"}
// format: required — "csv", "xlsx", "docx", or "google_sheets"// columns: required, at least 1 column// field_key: required — from the /export/columns registry// file_ids: optional UUIDs to scope export// folder_id: optional folder scope// include_subfolders: optional, default: false// flatten_tags: optional, default: true// google_sheets_title: optional (for google_sheets format)// connection_id: optional UUID — Google Drive connection (required for google_sheets)Response
{ "download_url": "https://storage.example.com/exports/files_2026-04-13.csv", "spreadsheet_url": null, "record_count": 42, "format": "csv"}/api/v2/files/quota-checkCheck upload quota before starting (prevents failed uploads from quota exhaustion).
Request
// Query parameters:?file_count=10 // requiredResponse
{ "can_proceed": true, "requested": 10, "available": 990, "monthly_limit": 1000, "current_usage": 10, "prepaid_credits": 0, "max_batch_size": 50, "max_concurrent_uploads": 10, "message": null}
// monthly_limit: -1 for unlimited tiers// When quota exceeded, can_proceed=false and message describes the shortfall/api/v2/files/check-duplicatesCheck which file hashes already exist for this tenant before uploading.
Request
{ "hashes": [ "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855", "d7a8fbb307d7809469ca9abcb0082e4f8d5651e46d3cdb762d02d0bf37c9e592" ]}
// hashes: SHA-256 content hashes (1-250 items)Response
{ "duplicates": [ "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855" ], "unique": [ "d7a8fbb307d7809469ca9abcb0082e4f8d5651e46d3cdb762d02d0bf37c9e592" ]}Upload Sessions & Status
Batch uploads create a session — a per-batch tracking record. Use these endpoints to poll progress, retrieve per-file results, cancel pending work, and look up the unified processing status of any individual file by ID.
/api/v2/files/uploads-status/{image_id}Get the unified upload + processing status for any file ID.
Response
{ "image_id": "550e8400-e29b-41d4-a716-446655440000", "batch_id": "batch_550e8400", "unified_status": "processing", // uploading | confirming | queued | processing | // completed | failed | partially_completed "component_statuses": { "variant_status": "completed", "description_status": "processing", "upload_status": "completed", "processing_status": "processing" }, "processing_ids": ["task_001", "task_002"], "error_message": null, "last_error_at": null, "created_at": "2026-01-15T10:30:00Z", "last_updated_at": "2026-01-15T10:30:45Z", "completed_at": null, "retry_count": 0, "processing_duration_seconds": 45.2, "is_stuck": false, "is_terminal": false}/api/v2/files/sessionsList upload sessions for the authenticated tenant.
Request
// Query parameters:?status=processing // optional, filter by status&upload_method=streaming // optional, "streaming" or "presigned"&offset=0 // pagination (default: 0)&limit=20 // default: 20, 1-100Response
{ "items": [ { "session_id": "550e8400-e29b-41d4-a716-446655440000", "status": "completed", "upload_method": "streaming", "total_files": 20, "completed_files": 18, "failed_files": 1, "skipped_files": 1, "progress_percentage": 100.0, "created_at": "2026-01-15T10:30:00Z", "completed_at": "2026-01-15T10:32:00Z" } ], "total_count": 15, "limit": 20, "offset": 0, "has_more": false}/api/v2/files/sessions/{session_id}/statusGet current progress and recent activity for an upload session.
Response
{ "session_id": "550e8400-e29b-41d4-a716-446655440000", "status": "processing", // pending | uploading | processing | // completed | failed | expired | cancelled "total_files": 20, "completed_files": 15, "failed_files": 1, "skipped_files": 2, "pending_files": 2, "progress_percentage": 90.0, "created_at": "2026-01-15T10:30:00Z", "started_at": "2026-01-15T10:30:00Z", "completed_at": null, "estimated_completion_time": null, "recent_completions": [ { "filename": "photo1.jpg", "image_id": "660e8400-e29b-41d4-a716-446655440001", "status": "completed", "description": "A site inspection showing...", "processing_time_ms": null } ], "recent_errors": [], "results_url": "/api/v2/files/sessions/{session_id}/results", "websocket_channel": "batch.{session_id}"}/api/v2/files/sessions/{session_id}/resultsPaginated per-file results from a session.
Request
// Query parameters:?include_failed=true // include failed files (default: true)&offset=0 // default: 0&limit=100 // 1-500, default: 100Response
{ "session_id": "550e8400-e29b-41d4-a716-446655440000", "results": [ { "image_id": "660e8400-e29b-41d4-a716-446655440001", "filename": "photo1.jpg", "status": "completed", "description": "Safety inspection showing...", "visible_text": "EXIT sign visible...", "tags": ["safety", "construction"], "processing_time_ms": null, "error_message": null, "thumbnail_url": "https://...", "created_at": "2026-01-15T10:30:05Z" } ], "total_count": 20, "offset": 0, "limit": 100, "has_more": false, "summary": {"total_files": 20, "completed": 17, "failed": 1, "skipped": 2}}/api/v2/files/sessions/{session_id}/cancelCancel a pending or in-progress session. Already-processed files keep their results.
Response
{ "session_id": "550e8400-e29b-41d4-a716-446655440000", "status": "cancelled", "total_files": 20, "completed_files": 10, "failed_files": 0, "skipped_files": 0, "pending_files": 10, "progress_percentage": 50.0, "created_at": "2026-01-15T10:30:00Z", "started_at": "2026-01-15T10:30:00Z", "completed_at": "2026-01-15T10:31:00Z", "estimated_completion_time": null, "recent_completions": [], "recent_errors": [], "results_url": "/api/v2/files/sessions/{session_id}/results", "websocket_channel": "batch.{session_id}"}
// 400 Bad Request if session is already completed/cancelled/expired/api/v2/files/sessions/{session_id}/summaryAggregated per-file status counts for every file in a session (uploading / processing / completed / failed / stuck) plus description status counts. Optimised for dashboards.
Response
{ "batch_id": "550e8400-e29b-41d4-a716-446655440000", "overall_status": "processing", "completion_percentage": 65.0, "counts": { "total": 20, "uploading": 2, "confirming": 0, "queued": 1, "processing": 4, "completed": 13, "failed": 0, "partially_completed": 0, "stuck": 0 }, "description_counts": { "pending": 0, "processing": 4, "completed": 13, "failed": 0, "skipped": 0 }, "error_summary": { "count": 0, "messages": [] }, "created_at": "2026-01-15T10:30:00Z", "last_activity_at": "2026-01-15T10:33:15Z"}/api/v2/files/sessions/stuckList uploads that have not made progress in the threshold window. Useful for client-side recovery flows. Each entry is a full UnifiedImageStatusResponse (same shape as /files/uploads-status/{image_id}).
Request
// Query parameters:?stuck_minutes=30 // default: 30, threshold for "stuck" (min 1)&limit=100 // default: 100, 1-500Response
{ "stuck_count": 1, "images": [ { "image_id": "550e8400-e29b-41d4-a716-446655440000", "batch_id": "770a0600-e29b-41d4-a716-446655440000", "unified_status": "uploading", "component_statuses": { "variant_status": null, "description_status": null, "upload_status": "streaming", "processing_status": null }, "processing_ids": [], "error_message": null, "last_error_at": null, "created_at": "2026-01-15T10:00:00Z", "last_updated_at": "2026-01-15T10:00:15Z", "completed_at": null, "retry_count": 0, "processing_duration_seconds": 1845.0, "is_stuck": true, "is_terminal": false } ]}Telemetry
/api/v2/files/log-upload-eventFire-and-forget client-side upload telemetry (e.g., browser-side errors, retries). Unauthenticated. Server logs the event for diagnostics; never blocks the upload.
Request
{ "event_type": "upload_retry", // required, max 100 chars "message": "Chunk 5 failed with NetworkError, retrying", // required, max 2000 chars "data": { // optional, serialized size <= 10 KB "upload_id": "550e8400-e29b-41d4-a716-446655440000", "part_number": 5, "user_agent": "Mozilla/5.0 ..." }, "timestamp": "2026-04-15T10:35:00Z", // optional client-side timestamp "batch_id": "batch_550e8400", // optional, max 100 chars "file_index": 5, // optional, 0-10000 "file_name": "huge.pdf" // optional, max 500 chars}Response
{ "status": "logged", "timestamp": "2026-04-15T10:35:00.123456+00:00"}Limits & Constraints
- • Streaming upload max: 100 MB per file
- • Single-shot presigned max: 5 GB (S3 PUT cap)
- • Multipart max: 5 TB (per S3 limits)
- • Multipart part size: 5 MB minimum per part; S3 imposes a 5 GB per-part hard limit
- • Synchronous document analyze: 10 MB (`/files/analyze`); use `/files/upload` for larger
- • Streaming batch size: 10–200 files per request (tier-dependent); each file capped at 100 MB
- • Batch delete: max 100 files per request
- • Search query length: 1–1000 characters
- • Tags per file: max 40 tags, each max 50 characters
- • Title length: max 255 characters
- • User description: max 10000 characters
- • Hash dedup batch: 1–250 hashes per call

