How to Set Up Email-Triggered Document Processing
Automatically process documents received via email.
What This Integration Does
Vendors, partners, and customers send important documents - invoices, purchase orders, contracts, statements - by email. This workflow watches a shared mailbox, pulls each attachment, extracts the contents, runs it through an AI Agent to pull structured fields, and lands clean records in your system of record. No more "did anyone process that PO?" on the team channel.
The workflow runs every time a new email arrives matching your filter. Attachments are processed individually, and each one produces both a structured record (for your DB) and an audit log (for compliance). Failed extractions route to a human review queue rather than silently dropping documents.
Prerequisites
- An email source - the Email trigger sub-type configured against a shared inbox.
- An AI Agent enabled in Spojit.
- The pdf and csv connectors for attachment parsing.
- A destination connection where structured records will be written (e.g. mongodb, netsuite, mysql).
Step 1: Email Trigger
Drop a Trigger node and set its type to Email. Filter by sender domain, subject pattern, or label so you only run on real document emails - not newsletters or replies. The trigger exposes the message body, sender, subject, and an array of attachments.
Step 2: Loop Over Attachments
Add a Loop node iterating over {{ email.attachments }}. For each attachment, branch on file type with a Condition node:
.pdf-> PDF extraction path.csv/.xlsx-> spreadsheet path.xml-> XML path- anything else -> route to human review
Step 3: Extract the Content
For PDFs, add a Connector node pointing at the pdf connector with the extract-text tool. For very long PDFs, use extract-pages to grab just the pages that contain the data (often the first page of an invoice). For CSVs, use the csv connector's parse tool followed by to-json to get structured rows. For XML, use the xml connector's to-json tool.
Step 4: Extract Structured Fields with the AI Agent
Add a Connector node in Agent Mode with Structured Output. The schema depends on what you're processing - here's one for invoices:
{
"vendor": { "type": "string" },
"invoiceNumber": { "type": "string" },
"invoiceDate": { "type": "string", "description": "ISO 8601 date" },
"dueDate": { "type": "string", "description": "ISO 8601 date" },
"currency": { "type": "string", "description": "ISO 4217 code" },
"subtotal": { "type": "number" },
"tax": { "type": "number" },
"total": { "type": "number" },
"lineItems": {
"type": "array",
"items": {
"type": "object",
"properties": {
"description": { "type": "string" },
"qty": { "type": "number" },
"unitPrice": { "type": "number" },
"amount": { "type": "number" }
}
}
}
}
Step 5: Validate Before Writing
Add a Connector node calling the math connector's sum tool over the line item amounts, then a Condition node that compares it to total - tax. If they don't match, route to a Human review node. This catches OCR errors, missed line items, or hallucinated numbers before they reach your books.
Step 6: Persist and Notify
Run a Parallel node:
- Store the structured record via mongodb
insert-documents, netsuitecreate-record, or mysqlinsert-rows. - Upload the original attachment to ftp
upload-file(or any storage destination) so the source document is preserved for audit. - Post a one-line summary to slack
send-messageso the team sees what was processed.
Tips
- Always check the email subject and sender against an allowlist - email is a common attack vector, and you don't want to OCR a PDF from an unknown sender.
- Hash the attachment contents (via the encoding connector's
hash-sha256tool) and store the hash to dedupe - vendors often re-send the same invoice. - For scanned PDFs that come back empty from
extract-text, route to a dedicated OCR step rather than failing the workflow.
Common Pitfalls
- Multi-page invoices - some vendors split a single invoice across two PDFs in one email. Process attachments per email rather than treating each file in isolation if you see this.
- Encoded subjects - non-ASCII characters in
FromorSubjectarrive as MIME-encoded strings. Decode before filtering or you'll silently drop matches. - Date format drift - vendors use every date format ever invented. Ask the AI to return ISO 8601 explicitly and validate it via the validation connector's
iso-datetool.
Testing
Forward 10 historical emails to a staging mailbox connected to a duplicate workflow that writes to a sandbox database. Compare extracted records to the originals. Once you see clean extraction across format variations, point the trigger at production.