Skip to content
DocuExtract

← All docs

API reference

Every dashboard feature is available over a documented REST API. Generate keys in Settings, send Bearer tokens, get structured JSON back.

Base URL

https://docuextract-engine.fly.dev

Authentication

Generate keys at /dashboard/settings. Each key is shown once at issue time; copy it somewhere safe.

curl https://docuextract-engine.fly.dev/v1/usage \
  -H "Authorization: Bearer dx_live_<your-token>"

Keys are scoped per-user. Revoke any time from the dashboard. Revoked keys return 401 on subsequent requests.

Quotas

Every plan has a monthly extraction limit. The quota check fires before any OCR/LLM work, so over-quota requests fail fast with a 402 response.

Plancredits / month
Free500
Pro5,000
Business25,000
Enterprise200,000+

Each extraction debits credits based on the scan mode that ran (1–80 credits/page). See /pricing for the full credit-per-mode breakdown.

Check current balance at GET /v1/credits/balance. Recent ledger events at GET /v1/credits/history.

Errors

CodeMeaning
400Malformed request — see detail
401Missing / invalid / revoked API key
402Monthly quota exceeded — upgrade plan
404Resource not found OR cross-user access attempt
409Conflicting state (e.g. resolving an already-resolved review item)
413File too large (>50 MB)
415Unsupported document type
5xxTransient — retry with exponential backoff

Templates

Create

curl -X POST https://docuextract-engine.fly.dev/v1/templates \
  -H "Authorization: Bearer dx_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "name": "US Invoice",
    "fields": [
      {"name": "invoice_number", "type": "text", "anchor": "Invoice No."},
      {"name": "invoice_date", "type": "date", "anchor": "Date"},
      {"name": "vendor", "type": "text", "anchor": "Vendor"},
      {"name": "total", "type": "currency", "anchor": "Total"}
    ],
    "confidence_threshold": 0.75
  }'

List / get / update / delete

GET    /v1/templates              # list user's active templates
GET    /v1/templates/{id}         # single template
PUT    /v1/templates/{id}         # update (bumps version)
DELETE /v1/templates/{id}         # soft-delete (audit-preserving)

Field types

text, number, date, currency, enum, table.

Field options

KeyRequiredMeaning
nameyesOutput key in extraction results
typeyesOne of the field types above
anchornoLabel preceding the value (e.g. "Invoice No.")
descriptionnoHint passed to the LLM extractor
bboxno[x, y, w, h] from the visual picker
pagenoPage number (1-indexed)
enum_valuesnoFor type: enum — array of allowed values

Documents

Upload

curl -X POST https://docuextract-engine.fly.dev/v1/documents \
  -H "Authorization: Bearer dx_live_..." \
  -F "file=@invoice.pdf"

Returns {id, filename, content_type, size_bytes, page_count, created_at}. Max size: 50 MB. Supported: PDF, PNG, JPEG, TIFF, WebP.

Page preview

GET /v1/documents/{id}/preview?page=1

Returns the page as a PNG. Used by the visual template picker.

Extract — sync (single document)

curl -X POST https://docuextract-engine.fly.dev/v1/extract \
  -H "Authorization: Bearer dx_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "document_id": "...",
    "template_id": "...",
    "use_llm": true
  }'

use_llm: false skips Tier 3 OCR + LLM extraction (regex + anchor only). Cheaper, faster, matches the Standard tier ($0.10/doc).

Response:

{
  "document_id": "...",
  "template_id": "...",
  "overall_confidence": 0.94,
  "method_counts": {"regex": 2, "anchor": 1, "llm": 1, "none": 0},
  "needs_review_count": 0,
  "ocr_tier_used": 0,
  "detected_language": "en",
  "detected_script": "Latin",
  "fields": [
    {
      "name": "invoice_number",
      "value_raw": "INV-2026-001",
      "value_coerced": "INV-2026-001",
      "confidence": 0.85,
      "method": "anchor",
      "source_text": "INV-2026-001",
      "source_page": 1,
      "source_bbox": [120, 80, 80, 12],
      "needs_review": false
    }
  ]
}

Every value has a source_text and source_bbox— DocuExtract never returns values that aren't traceable to the source document.

Batches — async (multi-document)

curl -X POST https://docuextract-engine.fly.dev/v1/batches \
  -H "Authorization: Bearer dx_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "template_id": "...",
    "document_ids": ["doc-1", "doc-2", "doc-3"],
    "name": "May invoices",
    "use_llm": true
  }'

Returns immediately with a batch id. Workers process documents in parallel.

Poll status

GET /v1/batches/{id}

Returns batch state + per-extraction status. Poll every 3–5 seconds while status is running.

Export results

GET /v1/batches/{id}/export?format=csv    # wide CSV, one row per document
GET /v1/batches/{id}/export?format=json   # nested JSON, full per-field detail

Review queue

Fields below the template's confidence threshold land here.

curl https://docuextract-engine.fly.dev/v1/review \
  -H "Authorization: Bearer dx_live_..."

# Approve as-is
curl -X POST https://docuextract-engine.fly.dev/v1/review/{field_id} \
  -H "Authorization: Bearer dx_live_..." \
  -H "Content-Type: application/json" \
  -d '{"action": "approve"}'

# Correct
curl -X POST https://docuextract-engine.fly.dev/v1/review/{field_id} \
  -H "Authorization: Bearer dx_live_..." \
  -H "Content-Type: application/json" \
  -d '{"action": "correct", "corrected_value": "INV-2026-001", "notes": "OCR misread"}'

API keys

Manage from the dashboard or programmatically:

GET    /v1/keys                  # list (no secret material)
POST   /v1/keys  {"name": "..."} # mint — raw token returned ONCE
DELETE /v1/keys/{id}             # revoke

Usage

curl https://docuextract-engine.fly.dev/v1/usage \
  -H "Authorization: Bearer dx_live_..."
{
  "plan_name": "solo",
  "docs_per_month": 500,
  "used_this_month": 73,
  "remaining": 427
}

OpenAPI schema

The engine ships an auto-generated OpenAPI 3 schema:

https://docuextract-engine.fly.dev/openapi.json
https://docuextract-engine.fly.dev/docs           # Swagger UI

Both are public — no API key required to inspect the schema.