Skip to content
DocuExtract

← All docs

Getting started

Five minutes start-to-finish. Sign up, drop in a sample document, see structured data come out the other side.

1. Sign up

Head to docuextract.ai/signup and create an account with Google, GitHub, or email. New accounts start on the Free plan: 500 credits per month (~100 standard-mode pages), 3 reusable templates, full feature access except Multilingual + Handwritten + Precision modes (those unlock on Pro+).

2. Create a template

A template is a reusable schema that describes what fields to extract from a class of documents. You build one per document type (invoices, receipts, intake forms, etc.) and run hundreds of similar documents through it.

Two ways to create one:

  • One-click starter: on /dashboard/templates, click Create starter invoice template. You get 6 fields (invoice number, date, vendor, total, subtotal, tax) wired up immediately. Best when your documents look like a standard US invoice.
  • Visual builder: click Build template. Upload a sample document; the page renders, and you click-and-drag to draw boxes around fields you want extracted. Name each one, pick its type (text / number / date / currency / enum), save. More on this →

3. Upload a document and extract

Go to /dashboard/documents. Pick the template you just created, drop in a PDF or image (PDF / PNG / JPEG / TIFF / WebP, up to 50 MB), and click Upload & extract.

For born-digital PDFs (anything created from a word processor — not a scan) extraction completes in under a second. Scanned PDFs take a few seconds while Tesseract reads the image. Documents with handwriting or unusual layouts escalate to the vision-LLM tier and take 5–15 seconds.

4. Understand the result

Each extracted field shows you:

  • Value — what the engine extracted (e.g. INV-2026-001)
  • Source (verbatim)— the exact text from the document this value came from. If the engine can't trace a value to a source, it doesn't return that value. That's the verbatim-grounding invariant — no hallucinated values, ever.
  • Confidence— a 0–100% score combining OCR confidence + extraction confidence + grounding confidence. Color-coded: green (≥75%), amber (50–75%), red (<50%).
  • Method — how this value was found: regex (cheap, for dates/currencies/numbers), anchor (label-based lookup), llm (LLM extraction).
  • Needs review— true when the field's confidence is below the template's threshold. These show up in the Review queue for human approval.

5. Where to go next

  • Templates guide — how the field-picker works in detail; when to use anchors vs. descriptions; type-specific behavior.
  • Batches — run hundreds of documents at once; download results as CSV/JSON.
  • Review queue — how to handle low-confidence fields; the audit trail.
  • API reference — integrate DocuExtract into your own application via REST. Every dashboard feature is also available over API.