Mailhook and Knowledge: Invoice PDF Extraction Template

A vendor emails an invoice PDF to your Mailhook address, and Spojit pulls out the invoice number, total, and due date as clean structured JSON.

What It Builds

A Mailhook trigger gives the workflow its own inbound email address. When an invoice lands, an Attachment node fetches the PDF bytes, a Knowledge node in Embed mode loads it into a Transient collection, and a Knowledge Query node reads back the fields you asked for. Because the collection is Transient, the document is embedded, queried, and discarded inside a single run, so nothing lingers in storage.

The Prompt

Paste this into Miraxa and it builds the workflow, connecting the tools for you:

Build a workflow that starts when an invoice PDF is emailed to a Mailhook address. Fetch the PDF attachment, embed it into a transient knowledge collection, then query it to extract the invoice number, total amount, and due date as structured JSON. Discard the document when the run finishes.

Connectors Used

  • Mailhook - the trigger; any email to the generated address starts a run.
  • Attachment - fetches the invoice PDF bytes from the inbound mail.
  • Knowledge - Embed mode loads the PDF into a Transient collection; Query mode extracts the fields.

Customize It

Change the fields in the prompt to match your invoices: add vendor name, PO number, or line items to the extraction list. You can also add a Send Email node to forward the structured result, or swap the Transient collection for a persistent one if you want to keep an archive of parsed invoices.

Tips

  • The Attachment node only works in Mailhook workflows, so keep the Mailhook trigger in place.
  • Knowledge handles OCR, so scanned or image-based invoice PDFs still parse.
  • Set a Response Schema on the Query so the invoice number, total, and due date always come back in the same shape.

Related

Did this answer your question? Thanks for the feedback There was a problem submitting your feedback. Please try again later.