Mailhook and Knowledge: Invoice PDF Extraction Template
A vendor emails an invoice PDF to your Mailhook address, and Spojit pulls out the invoice number, total, and due date as clean structured JSON.
What It Builds
A Mailhook trigger gives the workflow its own inbound email address. When an invoice lands, an Attachment node fetches the PDF bytes, a Knowledge node in Embed mode loads it into a Transient collection, and a Knowledge Query node reads back the fields you asked for. Because the collection is Transient, the document is embedded, queried, and discarded inside a single run, so nothing lingers in storage.
The Prompt
Paste this into Miraxa and it builds the workflow, connecting the tools for you:
Build a workflow that starts when an invoice PDF is emailed to a Mailhook address. Fetch the PDF attachment, embed it into a transient knowledge collection, then query it to extract the invoice number, total amount, and due date as structured JSON. Discard the document when the run finishes.
Connectors Used
- Mailhook - the trigger; any email to the generated address starts a run.
- Attachment - fetches the invoice PDF bytes from the inbound mail.
- Knowledge - Embed mode loads the PDF into a Transient collection; Query mode extracts the fields.
Customize It
Change the fields in the prompt to match your invoices: add vendor name, PO number, or line items to the extraction list. You can also add a Send Email node to forward the structured result, or swap the Transient collection for a persistent one if you want to keep an archive of parsed invoices.
Tips
- The Attachment node only works in Mailhook workflows, so keep the Mailhook trigger in place.
- Knowledge handles OCR, so scanned or image-based invoice PDFs still parse.
- Set a Response Schema on the Query so the invoice number, total, and due date always come back in the same shape.