How to Build a Supplier Onboarding Document Knowledge Base from Emailed Contracts
Suppliers email their onboarding packets (W-9s, certificates of insurance, signed terms) to a Spojit mailhook address, and every PDF in the message is fetched, embedded into a persistent Supplier Docs collection, and made answerable so procurement can ask questions like "what are this vendor's payment terms?" months later.
What This Integration Does
When you onboard a new supplier, the paperwork arrives by email as a pile of attachments: a W-9, a certificate of insurance, a signed terms-and-conditions sheet, a banking remittance form, and sometimes a price list. Those PDFs usually get filed in a shared drive and forgotten. This workflow turns that same email into a living, searchable archive. Each supplier emails their packet to a dedicated mailhook address, Spojit pulls the bytes of every PDF, and embeds them into a single persistent Knowledge collection keyed by the supplier name. Later, any workflow or any person on your team can ask that collection a plain-language question and get an answer grounded in the actual documents, with no manual filing.
The run is triggered by inbound mail (push, within seconds of arrival), so there is no mailbox to poll and no schedule to manage. The Attachment node fetches the attachment bytes, a Loop embeds each PDF into the persistent Supplier Docs collection, and the collection persists across runs and workflows. Re-sending the same packet overwrites documents that share a file name rather than duplicating them, so corrected forms cleanly replace stale ones. The collection is workspace-scoped, so a separate procurement workflow can run a Knowledge Query against it at any later date without re-reading the original email. This is distinct from a one-off contract-extraction run: nothing is discarded at the end, and the value compounds as more suppliers onboard.
Prerequisites
- A persistent Knowledge collection named
Supplier Docs. Create it from the Knowledge section of the sidebar with New Collection; the embedding model is fixed at creation, so keep the default Gemini Embedding 001. - A NetSuite connection if you want to confirm the sender is a real vendor record (optional but recommended). See the NetSuite connector reference.
- A Slack connection with permission to post to a channel, so procurement gets a heads-up when a packet lands. See the Slack connector reference.
- Suppliers (or an internal forwarding rule) that will send onboarding documents as PDF attachments to a single address you control.
Step 1: Add the Mailhook trigger and generate the address
Create a new workflow and open its Trigger node. Set Trigger Type to Mailhook. Give it an Address prefix such as supplier-docs (1 to 24 characters), then click Generate email address. Spojit produces a unique address of the form supplier-docs-<random16>@mailhook.spojit.com. Copy it and share it with your suppliers, or point an internal forwarding rule at it. Mailhook runs are always asynchronous: there is no reply to the sender, and the whole message is available downstream as {{ input }}, including {{ input.from }}, {{ input.subject }}, and the attachment references in {{ input.attachments }}.
Optionally tighten the trigger with a From allowlist (only accept mail from known supplier domains) and a Subject regex (for example require onboarding in the subject). If you ever need to rotate the address, Regenerate address kills the old one instantly.
Step 2: Fetch every PDF with the Attachment node in Multiple mode
Add an Attachment node directly after the trigger. This node only saves on a workflow whose trigger is a Mailhook, which is exactly what you built in Step 1. Configure it to pull every PDF in the packet:
- Mode:
Multiple(returns a list instead of a single object). - Content type:
application/pdfso signature images and inline logos are skipped. - Filename pattern: leave broad such as
*.pdf, or narrow it (for example*cert*.pdf, *w9*.pdf, *terms*.pdf) if suppliers always name files consistently. - Fail if no attachment matches: turn this on for onboarding so a packet with zero PDFs surfaces as a failed run rather than silently embedding nothing.
In Multiple mode the output is a list you can iterate over:
{
"attachments": [
{ "filename": "acme-w9.pdf", "contentType": "application/pdf", "size": 81234, "content": "" },
{ "filename": "acme-coi.pdf", "contentType": "application/pdf", "size": 240112, "content": "" }
],
"count": 2,
"totalBytes": 321346
}
Keep an eye on the limits: 10 MB per attachment and 25 MB per run by default. Each PDF's content field is base64 and feeds straight into the Knowledge node in the next step.
Step 3: Loop over the attachments and embed each into the Supplier Docs collection
Add a Loop node in ForEach mode over {{ attachment.attachments }} (the list from Step 2). Inside the loop body, add a Knowledge node in Embed mode with these fields:
- Collection:
Supplier Docs(the persistent collection from Prerequisites, not Transient). - Document Type:
PDF. - Document Input:
{{ item.content }}(the current attachment's base64 bytes). - File Name: a name that ties the document to the supplier so it is findable and overwrites cleanly on re-send. Build it from the sender and the original filename, for example
{{ input.from }} / {{ item.filename }}. Because Embed overwrites any document with a matching name, re-sending a corrected W-9 replaces the old one instead of creating a duplicate. - Output Variable:
embedResult(returns chunk count and metadata).
The Loop runs Embed once per PDF, so a packet of five documents adds five named documents to the collection. Because the collection is persistent, those documents stay available to every workflow in the workspace after the run ends.
Step 4: Confirm the sender against NetSuite (optional)
To make sure the packet came from a real vendor and not a stray inbox, add a Connector node on the NetSuite connector in Direct mode. Use run-suiteql to look the supplier up by email, or list-customers if you maintain vendors as customer records, matching on {{ input.from }}. A SuiteQL lookup keeps it precise:
SELECT id, companyName, terms
FROM vendor
WHERE LOWER(email) = LOWER('{{ input.from }}')
If the query returns a row, you have a confirmed vendor and can carry companyName into the file name from Step 3 for cleaner naming. If it returns nothing, route the run to a Slack alert (Step 6) so procurement can decide whether to add the vendor first. Direct mode keeps this deterministic and spends no AI credits. To branch on the result, add a Condition node that checks whether the lookup returned any rows.
Step 5: Answer a procurement question with a Knowledge Query
The whole point of a persistent archive is asking it questions later. You can do this in two places: inline at the end of this workflow to validate the embed, or (more commonly) in a separate procurement workflow that reads the same collection. Add a Knowledge node in Query mode:
- Collection:
Supplier Docs(persistent, the same collection you embedded into). - Prompt: a natural-language question, templated where useful, for example
What are the payment terms and certificate-of-insurance expiry date for {{ input.from }}? - Result Count: leave at the default
5, or raise it for multi-document suppliers. - Response Schema: optional. Supply a JSON schema to force a clean, structured answer that downstream steps can read reliably:
{
"type": "object",
"properties": {
"paymentTerms": { "type": "string" },
"coiExpiry": { "type": "string" },
"sourceDocuments": { "type": "array", "items": { "type": "string" } }
},
"required": ["paymentTerms"]
}
- Output Variable:
answer.
Set the Model to an AI model for synthesis. The Query answer is grounded in the embedded PDFs, so procurement gets terms and expiry dates pulled straight from the supplier's own paperwork. Remember to always embed and query a collection with the same embedding model it was created with.
Step 6: Notify procurement in Slack
Add a Connector node on the Slack connector in Direct mode using send-message to post a summary to your procurement channel. Reference the values you gathered above:
New supplier packet archived: {{ input.from }}
Documents embedded: {{ attachment.count }}
Payment terms on file: {{ answer.paymentTerms }}
COI expiry: {{ answer.coiExpiry }}
If you want to direct-message a specific buyer, first call lookup-user-by-email on the Slack connector to resolve their user, then send to that user. You can also add a Send Email node to confirm receipt back to the supplier at {{ input.replyTo }}, since the Mailhook trigger never auto-replies. For onboarding that needs a sign-off before the vendor is considered active, add a Human node before the Slack step so a procurement lead approves the packet first.
Tips
- Name embedded documents by supplier (
{{ input.from }} / {{ item.filename }}or the NetSuitecompanyName). This makes Query answers traceable and lets corrected forms overwrite stale ones automatically. - Keep one persistent collection for all suppliers rather than a collection per vendor. Query prompts can scope to a specific supplier by name, and a single collection is far simpler to manage.
- Use Direct mode for the NetSuite lookup and Slack post to keep those deterministic steps free of AI cost; reserve AI spend for the Query synthesis step.
- Received emails are retained for 30 days, so re-run a packet from execution history within that window if an embed needs to be repeated.
Common Pitfalls
- Picking Transient instead of a persistent collection in the Embed node discards everything at the end of the run. For a lasting archive you must select the
Supplier Docscollection by name. - Leaving the Attachment node in Single mode only fetches the first PDF, so multi-document packets lose everything after the first file. Use Multiple mode and a Loop.
- Forgetting the Content type filter means inline logos and signature images get pulled and may trip the 25 MB-per-run limit. Restrict to
application/pdf. - Embedding with one model and later querying with another returns poor results. The collection's embedding model is fixed at creation, so never change it between embed and query.
- The Mailhook trigger fires whether the address is in To, Cc, or Bcc, so a supplier blind-copying the address still triggers a run. Use the From allowlist if you only want known senders to onboard.
Testing
Send one onboarding email containing two or three small PDFs to your generated mailhook address. Open the workflow's execution history and confirm the run fired within seconds, the Attachment node reports the expected count, and the Loop ran the Embed node once per document. Open the Supplier Docs collection in the Knowledge sidebar and check the document table shows each file with status READY and a sensible chunk count. Then run the Query step (or a small test workflow) with a prompt you know the answer to, such as the payment terms in the terms sheet, and verify the response matches the PDF. Once that round-trip is clean, share the address with real suppliers. If a packet ever embeds nothing, confirm Fail if no attachment matches is on so the gap shows as a failed run.