Skip to content
DocuExtract

← All docs

Batches

Run many documents through one template in parallel. Track progress live, download results as CSV or JSON.

Why batches

Single-document extraction (previous page) is synchronous — you wait for the result. That's fine for one doc but doesn't scale to hundreds. Batches let you fire-and-poll: upload N documents, the engine queues them, workers extract in parallel, you check back when ready.

From the dashboard

Go to /dashboard/batches/new:

  1. Pick a template (and optionally name the batch — e.g. "May invoices")
  2. Multi-select files (Ctrl/Cmd-click or Shift-click)
  3. Toggle Use LLM on or off (Premium vs Standard tier)
  4. Click Start batch— files upload sequentially, then the batch is created and you're redirected to the detail view

Watching progress

The batch detail page polls every 3 seconds while documents are still processing. You'll see:

  • Progress bar + percent complete
  • Per-status count grid (pending / running / completed / needs review / failed)
  • Per-extraction table with timestamps and any error messages

When the batch reaches a terminal state (completed or failed for every doc), polling stops.

Exporting results

Once at least one extraction has completed, two download buttons appear in the progress card:

  • CSV — wide format. One row per document; columns are document metadata + each field name (plus __confidence and __needs_review suffix columns). Best for Excel / Google Sheets / data warehouse imports.
  • JSON — nested. Array of {document_id, filename, status, fields: [...]}. Best for programmatic consumers + audit-trail downstream.

Both downloads include source provenance per field — every value is traceable back to a specific bounding region on the source document.

Quotas

Each document in a batch counts against your monthly quota. The check happens upfront — a batch that would exceed your limit returns 402 before any work starts, rather than half-processing.

From the API

See the API reference for batch endpoints. Three endpoints cover the full lifecycle:

  • POST /v1/batches — create + enqueue
  • GET /v1/batches/{id} — status (poll this)
  • GET /v1/batches/{id}/export?format=csv|json — download