How to Extract Disclosure Document Terms from a Mailhook PDF with Transient Knowledge
Build a Spojit workflow that receives an emailed property disclosure or sale contract, embeds the PDF into a Transient knowledge collection, and uses a structured Knowledge Query to pull the purchase price, settlement date, and special conditions straight onto a Monday.com transaction board.
What This Integration Does
Real-estate teams live in PDFs: vendor disclosure statements, contracts of sale, and special-condition addenda land in an inbox as attachments and then get re-keyed by hand into a transaction file. This workflow removes the re-keying. A Mailhook trigger gives you a dedicated email address; the moment an agent or conveyancer forwards a disclosure PDF to it, Spojit fetches the document, reads it, and creates a fully-populated item on your Monday.com transaction board with the key terms already filled in.
The run model is push-based and per-email. Each email that reaches the Mailhook address starts one execution. An Attachment node fetches the PDF bytes, a Knowledge Embed node loads them into a Transient collection that exists only for that run, and a Knowledge Query node reads back the extracted terms as structured JSON enforced by a Response Schema. A Transform shapes those terms into Monday column values, and a Direct-mode Connector node calls create-item. The run leaves one new board item behind and discards the embedded document automatically on completion. Re-runs are safe because Mailhook deduplicates per message, and a duplicate forward of the same email will not start a second run.
Prerequisites
- A Spojit workspace with the monday connector connected (an API-key connection with permission to create items on the target board).
- The Monday.com board ID for your transaction file, plus the column IDs you want to populate (for example a text column for the address, a numbers column for the purchase price, and a date column for settlement). Open the board in Monday.com and note the column IDs from each column's settings.
- The json utility connector is built in and needs no connection.
- Permission in Spojit to create workflows and to generate a Mailhook address.
- A sample disclosure or contract PDF you can forward for testing.
Step 1: Create the workflow and add the Mailhook trigger
Create a new workflow, then on the Trigger node set Trigger Type to Mailhook. Optionally set an Address prefix (1 to 24 characters, default mh) such as disclosures, then click Generate email address. Spojit produces a unique address of the form disclosures-<random16>@mailhook.spojit.com. Copy it and use it as the forwarding destination for your disclosure emails. To keep noise out, add a From allowlist for your agents' or conveyancers' domains and a Subject regex such as (?i)disclosure|contract. Mailhook fires whether the address is in To, Cc, or Bcc, and it deduplicates per message, so an accidental re-forward will not double-process.
The trigger output is available downstream as {{ input }} and includes {{ input.subject }}, {{ input.from }}, {{ input.replyTo }}, and an attachments[] list where each entry is a reference of the shape { id, filename, contentType }.
Step 2: Fetch the PDF bytes with an Attachment node
Add an Attachment node after the trigger. This node only works in Mailhook workflows; it turns an attachment reference into actual bytes. Set Mode to Single so you get the first match as a single object, set the Content type filter to application/pdf, and set the Filename pattern to *.pdf so a signature image or logo in the email is ignored. Turn on Fail if no attachment matches so a stray email without a PDF stops cleanly instead of running on empty.
The Single-mode output looks like this:
{
"filename": "123-Smith-St-disclosure.pdf",
"contentType": "application/pdf",
"size": 248193,
"content": "JVBERi0xLjcKJ..."
}
The content field is base64-encoded PDF bytes. Note the per-attachment limit of 10 MB and the per-run limit of 25 MB; a long contract with many scanned exhibits can approach these. Name this node's output variable disclosure so you can reference {{ disclosure.content }} in the next step.
Step 3: Embed the PDF into a Transient collection
Add a Knowledge node in Embed mode. In the Collection dropdown choose Transient. A Transient collection is created for this single run, is shared across the other Knowledge nodes in the same run, and is cleaned up automatically when the run finishes, so there is no file name or embedding model to choose and nothing to manage later. Set Document Type to PDF, and set Document Input to {{ disclosure.content }} so the bytes you just fetched are loaded directly. Set the Output Variable to embedResult; it returns the chunk count and metadata, which is handy for confirming the document was read.
This "embed then query then discard" pattern is exactly what Transient collections are for: a one-off extraction on a single document where you never need the file again. If you instead wanted a durable archive of every disclosure for later search, you would point this at a persistent collection, but for a transaction-file workflow Transient keeps the workspace clean.
Step 4: Query the document with a Response Schema
Add a second Knowledge node in Query mode and choose Transient in the Collection dropdown so it reads the document embedded earlier in the same run. In Prompt, describe what to pull, for example:
Extract the key commercial terms of this property
disclosure or contract of sale: the property address,
the purchase price as a number, the settlement (completion)
date, and any special conditions. If a value is not
stated, return null for it.
Set Result Count high enough to cover the relevant clauses (the default of 5 is usually fine for a single contract), pick a synthesis Model, and most importantly fill in the Response Schema so the answer comes back as predictable JSON rather than prose:
{
"type": "object",
"properties": {
"address": { "type": "string" },
"purchasePrice": { "type": ["number", "null"] },
"settlementDate": { "type": ["string", "null"] },
"specialConditions": {
"type": "array",
"items": { "type": "string" }
}
},
"required": ["address", "purchasePrice", "settlementDate"]
}
Set the Output Variable to terms. Downstream you can now rely on {{ terms.address }}, {{ terms.purchasePrice }}, {{ terms.settlementDate }}, and {{ terms.specialConditions }} being present and correctly typed. The Response Schema is what makes this safe to feed into a structured system like Monday.com without brittle text parsing. This is the intelligent layer of Spojit reading the contract for you while the schema guarantees the shape.
Step 5: Shape the column values with a Transform
Monday.com expects column values as a single JSON object keyed by column ID. Add a Transform node to map the extracted terms onto your real board columns. If you prefer to build the object explicitly with a utility, add a Connector node on the json connector in Direct mode using the set tool to assemble the object field by field, or the stringify tool once it is assembled. A straightforward Transform output that targets a numbers column, a date column, and a long-text column might be:
{
"numbers_price": {{ terms.purchasePrice }},
"date_settlement": { "date": "{{ terms.settlementDate }}" },
"long_text_conditions": "{{ terms.specialConditions }}"
}
Replace numbers_price, date_settlement, and long_text_conditions with your actual Monday.com column IDs. Keep the special conditions joined into one text value, or store them as a checklist column if your board uses one. Set this node's output variable to columns.
Step 6: Create the Monday.com item with create-item
Add a Connector node on the monday connector in Direct mode and pick the create-item tool. Map the inputs:
boardId: your transaction board's ID.name: the item title, for example{{ terms.address }}so each row is named after the property.groupId(optional): the group on the board where new transactions should land, such as a "New disclosures" group.columnValues:{{ columns }}from the previous step, which Spojit passes through as the column-values object.
On success, create-item returns the new item's id and name. Save it to an output variable like created so a later step can reference {{ created.id }}.
Step 7: Confirm back to the sender (optional)
Because Mailhook is always asynchronous, the person who forwarded the email gets no automatic reply. Add a Send Email node to close the loop. Set Recipients to {{ input.replyTo }}, give it a Subject like Disclosure logged: {{ terms.address }}, and in the Body summarise the captured terms, for example "Recorded purchase price {{ terms.purchasePrice }} with settlement on {{ terms.settlementDate }}." Send Email uses Spojit's built-in mail service, but remember external recipients must be on your org allowlist under Settings → General → Email recipients. To send from your own domain instead, use the resend or smtp connector.
Tips
- If contracts arrive as scanned images rather than text PDFs, the Embed node still handles them: choose the image document types, which run through OCR, so the Query can read scanned pages.
- Keep the Embed and Query nodes on the same Transient collection within the same run. Transient collections are not visible to other runs or workflows, which is exactly why they are perfect for single-document extraction.
- Use Miraxa, the intelligent layer across your automation, to scaffold the canvas quickly: try "Build a workflow that watches a mailhook, embeds the PDF attachment into a Transient knowledge collection, queries it for purchase price and settlement date, and creates a Monday.com item." Then fine-tune each node in the properties panel.
- Store the chunk count from
{{ embedResult }}in your confirmation email while testing so you can spot a document that embedded with suspiciously few chunks.
Common Pitfalls
- Embedding into a persistent collection by mistake leaves a copy of every contract sitting in your workspace. Confirm the Embed and Query nodes both say Transient unless you genuinely want an archive.
- Monday.com is strict about column-value shapes: a date column needs
{ "date": "YYYY-MM-DD" }and a numbers column needs a bare number, not a string. Ifcreate-itemrejects the payload, check that your Response Schema typedpurchasePriceas a number and that your Transform reformattedsettlementDateto the date shape Monday expects. - Without a Filename pattern or Content type filter, the Attachment node may grab an email-signature logo instead of the contract. Always scope it to
*.pdfandapplication/pdf, and turn on Fail if no attachment matches. - Large multi-exhibit contracts can exceed the 10 MB per-attachment or 25 MB per-run limits. If you expect oversized documents, split them upstream before forwarding, or page through them with the pdf connector first.
Testing
Before pointing live forwarding rules at the address, forward one real disclosure PDF from an allowlisted sender to the Mailhook address and watch the execution in your run history. Confirm the Attachment node returned a non-zero size, the Embed node reported a sensible chunk count, and the Query node's terms output matched the contract's actual price, settlement date, and conditions. Check that the new Monday.com item carries the right column values. Once one document round-trips cleanly, widen the From allowlist or add the forwarding rule so production disclosures flow in. Keep in mind received emails are retained for 30 days, so you can re-test against the same message during that window.
Learn More
- Mailhook trigger reference
- Attachment node reference
- Knowledge collections and the Embed/Query modes
- Monday.com connector and create-item
- Setting Up a Mailhook Trigger
- Using Knowledge Nodes
- How to Create NetSuite Sales Orders from Emailed PO PDFs
- How to Create Shopify Orders from PO PDFs Emailed to a Mailhook