Skip to content
DocuExtract

← All docs

Templates

A template is a reusable schema describing what fields to extract from a class of documents. Build one, run many similar documents through it.

What a template is

A template is a list of fields— each with a name, a type, and optional hints. When you extract a document with a template, the engine produces one value per field (or null when a field can't be found).

Templates are versioned. Editing a template bumps its version; past batches keep referring to the version they ran against, so corrections to a template don't invalidate historical extraction results.

Two ways to create one

Starter invoice template

On /dashboard/templates, click Create starter invoice template. You get 6 fields:

  • invoice_number (text, anchor: "Invoice No.")
  • invoice_date (date, anchor: "Date")
  • vendor (text, anchor: "Vendor")
  • total (currency, anchor: "Total")
  • subtotal (currency, anchor: "Subtotal")
  • tax (currency, anchor: "Tax")

Best when your documents look like a standard US/Western invoice. Edit any field afterward to fit your exact format.

Visual builder

Click Build template. Upload a sample document; the page renders, and you click-and-drag to draw bounding boxes around fields you want extracted. Name each one, pick its type, save.

The bounding boxes are stored as authoring metadata. Extraction uses anchor-based lookup by default (more robust to layout drift across documents) but the boxes give you a visual record of where each field lives on a typical document.

Field types

TypeUse forCoercion
textFree-form stringsReturns the verbatim string
numberCounts, quantities, IDs that are numericCoerced to Decimal
dateDates in any format (ISO, US, EU, long-form)Coerced to ISO date (YYYY-MM-DD)
currencyMoney amountsCoerced to Decimal in canonical form
enumFixed set of allowed valuesConstrained to enum_values
tableRepeating row data (line items)Returns array of objects

Anchors

An anchor is a label that appears next to (or above) the value you want. Examples: "Invoice No.", "Total:", "Vendor:". The engine searches for the anchor text in the document, then grabs the adjacent value.

When to use an anchor:whenever a label exists. It's the most reliable extraction method — almost free (no LLM) and very accurate when the label is consistent across documents.

When to skip the anchor:if the value isn't preceded by a label (e.g. a vendor name at the top of the document with no "Vendor:" prefix). The extraction layer falls back to the LLM, using your description field as a hint.

Descriptions

A description is a natural-language hint sent to the LLM extractor. Example: description: "Issuing company at the top of the invoice". The LLM uses this to disambiguate when multiple candidate values exist.

Confidence threshold

Each template has a confidence_threshold (default 0.75). Fields extracted below this threshold are flagged needs_review = true and appear in the Review queue.

Set higher (e.g. 0.90) for high-stakes workflows where a wrong value costs more than an extra review action. Set lower (e.g. 0.60) for low-stakes bulk processing where occasional errors are acceptable.