DocuExtract

Five minutes start-to-finish. Sign up, drop in a sample document, see structured data come out the other side.

1. Sign up

Head to docuextract.ai/signup and create an account with Google, GitHub, or email. New accounts start on the Free plan: 500 credits per month (~100 standard-mode pages), 3 reusable templates, full feature access except Multilingual + Handwritten + Precision modes (those unlock on Pro+).

2. Create a template

A template is a reusable schema that describes what fields to extract from a class of documents. You build one per document type (invoices, receipts, intake forms, etc.) and run hundreds of similar documents through it.

Two ways to create one:

One-click starter: on /dashboard/templates, click Create starter invoice template. You get 6 fields (invoice number, date, vendor, total, subtotal, tax) wired up immediately. Best when your documents look like a standard US invoice.
Visual builder: click Build template. Upload a sample document; the page renders, and you click-and-drag to draw boxes around fields you want extracted. Name each one, pick its type (text / number / date / currency / enum), save. More on this →

3. Upload a document and extract

Go to /dashboard/documents. Pick the template you just created, drop in a PDF or image (PDF / PNG / JPEG / TIFF / WebP, up to 50 MB), and click Upload & extract.

For born-digital PDFs (anything created from a word processor — not a scan) extraction completes in under a second. Scanned PDFs take a few seconds while Tesseract reads the image. Documents with handwriting or unusual layouts escalate to the vision-LLM tier and take 5–15 seconds.

4. Understand the result

Each extracted field shows you:

Value — what the engine extracted (e.g. INV-2026-001)
Source (verbatim)— the exact text from the document this value came from. If the engine can't trace a value to a source, it doesn't return that value. That's the verbatim-grounding invariant — no hallucinated values, ever.
Confidence— a 0–100% score combining OCR confidence + extraction confidence + grounding confidence. Color-coded: green (≥75%), amber (50–75%), red (<50%).
Method — how this value was found: regex (cheap, for dates/currencies/numbers), anchor (label-based lookup), llm (LLM extraction).
Needs review— true when the field's confidence is below the template's threshold. These show up in the Review queue for human approval.

5. Where to go next

Templates guide — how the field-picker works in detail; when to use anchors vs. descriptions; type-specific behavior.
Batches — run hundreds of documents at once; download results as CSV/JSON.
Review queue — how to handle low-confidence fields; the audit trail.
API reference — integrate DocuExtract into your own application via REST. Every dashboard feature is also available over API.

Getting started

1. Sign up

2. Create a template

3. Upload a document and extract

4. Understand the result

5. Where to go next