Getting started
Five minutes start-to-finish. Sign up, drop in a sample document, see structured data come out the other side.
1. Sign up
Head to docuextract.ai/signup and create an account with Google, GitHub, or email. New accounts start on the Free plan: 500 credits per month (~100 standard-mode pages), 3 reusable templates, full feature access except Multilingual + Handwritten + Precision modes (those unlock on Pro+).
2. Create a template
A template is a reusable schema that describes what fields to extract from a class of documents. You build one per document type (invoices, receipts, intake forms, etc.) and run hundreds of similar documents through it.
Two ways to create one:
- One-click starter: on
/dashboard/templates, click Create starter invoice template. You get 6 fields (invoice number, date, vendor, total, subtotal, tax) wired up immediately. Best when your documents look like a standard US invoice. - Visual builder: click Build template. Upload a sample document; the page renders, and you click-and-drag to draw boxes around fields you want extracted. Name each one, pick its type (text / number / date / currency / enum), save. More on this →
3. Upload a document and extract
Go to /dashboard/documents. Pick the template you just created, drop in a PDF or image (PDF / PNG / JPEG / TIFF / WebP, up to 50 MB), and click Upload & extract.
For born-digital PDFs (anything created from a word processor — not a scan) extraction completes in under a second. Scanned PDFs take a few seconds while Tesseract reads the image. Documents with handwriting or unusual layouts escalate to the vision-LLM tier and take 5–15 seconds.
4. Understand the result
Each extracted field shows you:
- Value — what the engine extracted (e.g.
INV-2026-001) - Source (verbatim)— the exact text from the document this value came from. If the engine can't trace a value to a source, it doesn't return that value. That's the verbatim-grounding invariant — no hallucinated values, ever.
- Confidence— a 0–100% score combining OCR confidence + extraction confidence + grounding confidence. Color-coded: green (≥75%), amber (50–75%), red (<50%).
- Method — how this value was found:
regex(cheap, for dates/currencies/numbers),anchor(label-based lookup),llm(LLM extraction). - Needs review— true when the field's confidence is below the template's threshold. These show up in the Review queue for human approval.
5. Where to go next
- Templates guide — how the field-picker works in detail; when to use anchors vs. descriptions; type-specific behavior.
- Batches — run hundreds of documents at once; download results as CSV/JSON.
- Review queue — how to handle low-confidence fields; the audit trail.
- API reference — integrate DocuExtract into your own application via REST. Every dashboard feature is also available over API.