Now with GPT-4o & on-premise local LLM support

Turn procurement PDFs into
ERP-ready data, instantly

AIDoc extracts structured fields from any purchase order or order confirmation and converts them directly to WAW-compatible XML — no manual data entry.

openAPI_extraction.py — terminal
$ uv run python openAPI_extraction.py --doc-type customer_orders --folder "input/orders"

Output folder: output/orders_20260430_143012
Loading schemas for doc type 'customer_orders' …
Processing 5 file(s) with model 'gpt-4o' …

[Step 1] Extracting 'order_DEU10033688.pdf' (model: gpt-4o, 3p) …
         Saved → output/.../order_DEU10033688_step1_default.json
[Step 2] Mapping to WAW customer_orders schema …
         Saved → output/.../order_DEU10033688_step2_customer_orders.json
[Step 3] Converting to XML …
         Saved → output/.../order_DEU10033688_step3_customer_orders.xml

── Summary ─────────────────────────────────────────────
  OK   : 5/5
  Output: output/orders_20260430_143012
<5s
per document
3
output formats
100%
local or cloud
0
manual entry

Everything you need to
automate procurement intake

From raw PDF to ERP-ready XML in one command. No templates. No training data. No manual field mapping.

AI-Powered Extraction
GPT-4o reads PDFs directly — tables, images, scanned text, multi-column layouts. No OCR preprocessing required.
Two Document Types
Process customer purchase orders and supplier order confirmations with dedicated schemas and prompts for each type.
Schema Validation
Every extraction is validated: article counts, critical field presence, and total cross-checks against the source document.
ERP-Ready XML
Outputs conform to your WAW import schema. New ERP fields added to the schema automatically propagate — no code changes needed.
Telemetry Dashboard
Real-time Streamlit dashboard backed by a FastAPI server and SQLite. Track volumes, models, and success rates over time.
Parallel Batch Processing
Process entire folders in parallel. Configurable worker pools for both document-level and extraction-level concurrency.

Three steps, zero manual work

A deterministic, auditable pipeline where every intermediate artifact is saved to disk — inspect, replay, or override any step.

1
Extract
GPT-4o (or a local Ollama model) reads your PDF and extracts every field into a generic JSON — order numbers, addresses, line items, payment terms, notes.
_step1_default.json
2
Map
A deterministic Python mapper transforms the generic JSON into your exact WAW ERP import schema. No LLM ambiguity in the mapping step.
_step2_customer_orders.json
3
Export
The mapped JSON is serialised to clean, indented XML. Drop it directly into your ERP import folder — no transformation needed.
_step3_customer_orders.xml

Cloud accuracy or
on-premise privacy — your choice

Cloud
OpenAI Pipeline
Highest accuracy. GPT-4o reads PDFs natively — tables, images, and scanned pages all work out of the box.
  • GPT-4o native PDF understanding
  • Easiest setup — just an API key
  • Auto-selects larger model for long docs
  • Structured JSON output mode
  • Requires internet + OpenAI subscription
Get started →
Local
Local LLM Pipeline
Full on-premise. Uses Ollama and LangExtract — your documents never leave your network.
  • 100% on-premise, GDPR-friendly
  • Free to run (hardware only)
  • GPU acceleration via Ollama
  • Configurable model (Qwen, LLaMA, etc.)
  • Lower accuracy on complex layouts
View guide →

Simple, transparent pricing

Start free. Scale as your procurement volume grows. Enterprise plans include on-premise deployment and white-glove onboarding.

Starter
Free
For individuals and small teams evaluating the platform.
  • 20 documents / month
  • Customer orders + order responses
  • JSON + XML output
  • Local LLM pipeline
  • Community support
Get started free
Enterprise
Custom
For organisations with high volumes, compliance requirements, or existing ERP integrations.
  • Unlimited documents
  • On-premise deployment
  • Custom document types
  • WAW / SAP / Dynamics integration
  • SSO + audit logging
  • Dedicated support & SLA
Contact sales

Ready to eliminate
manual data entry?

Set up in minutes. First extraction in under five minutes.