How to Build a Brand-Voice Knowledge Base from Your Past Captions and Media Kit

Forward your media kit, past newsletters, and tone guidelines to a Spojit mailhook so the Attachment node fetches each document and the Knowledge node embeds it into a persistent collection that every future content workflow can query for your brand voice.

What This Integration Does

As a creator, your voice lives across scattered files: a media kit PDF, exported newsletter archives, a tone-of-voice one-pager, your best-performing captions saved as a document. When you later want Spojit to draft an Instagram caption or turn a blog post into a LinkedIn post, the draft only sounds like you if the workflow can see those reference materials. This tutorial builds the foundation for that: a single, reusable brand-voice library. You email your reference documents to a dedicated address, and Spojit captures each one into a persistent Knowledge collection. Any content workflow you build afterward can query that collection to ground its drafts in your real wording, recurring phrases, and house rules.

The run model is push-based and document-by-document. A Mailhook trigger fires within seconds of any email arriving at the address Spojit generates for you. The Attachment node fetches the raw bytes of each file the email carried, a Loop node walks through them one at a time, and a Knowledge node in Embed mode writes each document into a named collection that persists across runs. Re-running is safe and additive: send a fresh batch any time and those documents join the same collection. Embedding a file whose name already exists overwrites the old version, so updating your media kit is just re-sending it under the same filename.

Prerequisites

  • A Spojit workspace where you can create a workflow on the canvas (the designer at miraxa.spojit.com).
  • Your reference material as files you can email: media kit (PDF), past newsletters (PDF, Word, EML, or Markdown), and a tone/brand-voice guide. Spojit accepts PDF, Word, Excel, PowerPoint, CSV/TSV, JSON, XML, HTML, Plain Text, Markdown, RTF, Email (EML/MSG), EPUB, images (via OCR), and web page URLs as document types.
  • Permission to create a Knowledge collection in the Knowledge section of the sidebar. Collections are workspace-scoped, so every workflow in your workspace can query the one you build here.
  • Files within the defaults the Attachment node enforces: 10 MB per attachment and 25 MB per run. Split very large archives across several emails.

Step 1: Create the persistent brand-voice collection

Before the workflow can write to a collection, the collection must exist. In the sidebar open the Knowledge section and choose New Collection. Name it something stable you will reference everywhere, for example brand-voice, and add a short description such as "Media kit, newsletters, and tone rules for caption and content drafting." The embedding model is fixed at creation, so pick it now and keep it: Gemini Embedding 001 (the default) is a solid choice. Whatever you choose here, every workflow that later queries this collection must read it with the same embedding model, so note your selection. You do not need to upload anything by hand; the workflow you are about to build will populate it.

Step 2: Add the Mailhook trigger and get your intake address

Create a new workflow and open the trigger. Set Trigger Type to Mailhook. Optionally set an Address prefix (1 to 24 characters, default mh) such as brandvoice so the address is recognizable, then choose Generate email address. Spojit produces a unique address of the form brandvoice-<random16>@mailhook.spojit.com. Copy it and save it as a contact or forwarding target. Any mail sent to it starts a run within seconds, with no mailbox or sign-in involved.

Because anyone who learns the address could send to it, tighten intake with the optional filters: add a From allowlist containing your own email addresses, and optionally a Subject regex like (?i)brand[\s-]?voice so only deliberately tagged emails are processed. The trigger fires whether the address appears in To, Cc, or Bcc, and duplicate copies of the same message are deduplicated. The full email is available downstream as {{ input }}, including {{ input.subject }}, {{ input.from }}, {{ input.replyTo }}, and {{ input.attachments }} (each attachment reference is { id, filename, contentType }).

Step 3: Fetch the attachments with the Attachment node

Add an Attachment node directly after the Mailhook trigger. (The designer only saves an Attachment node in a workflow that has a Mailhook trigger, so add it here.) This node turns the lightweight attachment references on the trigger into actual file bytes. Configure it to pull every document so a single email can carry your whole kit:

  • Mode: Multiple so the node returns a list rather than only the first match.
  • Content type filter (optional): restrict to the document types you send, for example application/pdf, text/markdown, or leave blank to accept all.
  • Filename pattern (optional): a glob such as *.pdf if you only want PDFs in this batch.
  • Min/Max size (optional): screen out tiny signature images or oversized files.
  • Fail if no attachment matches: turn this on for this workflow so an email sent with no usable file ends loudly instead of silently doing nothing.

In Multiple mode the output looks like this, with each content field holding the base64 bytes you will embed:

{
  "attachments": [
    { "filename": "media-kit.pdf", "contentType": "application/pdf", "size": 248113, "content": "JVBERi0xLjcK..." },
    { "filename": "spring-newsletter.pdf", "contentType": "application/pdf", "size": 91044, "content": "JVBERi0xLjcK..." }
  ],
  "count": 2,
  "totalBytes": 339157
}

Step 4: Loop over each attachment

Add a Loop node and set it to ForEach over the attachment list, iterating {{ attachment.attachments }} (the output array from the Attachment node). The Loop exposes the current item to the nodes inside its body, so within the loop you can reference {{ item.filename }}, {{ item.contentType }}, and {{ item.content }}. Looping one file per iteration keeps each document a distinct, separately named entry in your collection, which makes them easy to find, overwrite, or delete later. Everything in the next step lives inside this Loop body.

Step 5: Embed each document into the collection with the Knowledge node

Inside the Loop body, add a Knowledge node and set its mode to Embed. This is where each file becomes searchable brand-voice material. Configure:

  • Collection: select the persistent brand-voice collection you created in Step 1 (not Transient, since you want this to last across runs).
  • File Name: use the source filename so updates overwrite cleanly, for example {{ item.filename }}. Embedding a document whose name already exists in the collection overwrites the previous version.
  • Document Type: choose the type that matches the file, for example PDF for your media kit and newsletters, or Markdown for a tone guide. If a single batch mixes types, you can branch on {{ item.contentType }} with a Condition node, or send one type per email to keep this field constant.
  • Document Input: point this at the raw bytes, {{ item.content }}.
  • Embedding Model (optional): leave it to match the collection's fixed model, Gemini Embedding 001.
  • Output Variable: name it, for example embedResult. It returns the chunk count and metadata so you can confirm the document was processed.

Each iteration writes one document. After the Loop finishes, all attachments from that email are in your collection, with statuses moving from PROCESSING to READY as they finish chunking.

Step 6: Confirm the embed and email yourself a summary

After the Loop, add a Send Email node so you get a receipt for every batch you forward. Send Email uses Spojit's built-in mail service, so no connection is needed. Set:

  • Recipients: reply to whoever sent the batch with {{ input.replyTo }}, or hardcode your own address. Remember external recipients must be on the org allowlist under Settings → General → Email recipients.
  • Subject: something like Added {{ attachment.count }} document(s) to your brand-voice library.
  • Body: a short confirmation, for example "Spojit embedded {{ attachment.count }} file(s) from your email '{{ input.subject }}' into the brand-voice collection. They are now queryable by your content workflows."
  • If sending fails: Continue anyway, so a mail hiccup does not undo the embed that already succeeded.

Only upstream variables resolve in Send Email, so reference values from the trigger and the Attachment node as shown above. Save and enable the workflow.

Step 7: Query the brand voice from your content workflows

The library is now the input to everything else you build. In any content workflow, add a Knowledge node in Query mode pointed at the same brand-voice collection, with a Prompt describing what you want it to retrieve, a Model for synthesis, a Result Count (default 5), and an Output Variable. For example, before drafting a caption:

Summarize this creator's brand voice for short-form social captions:
preferred tone, recurring phrases or sign-offs, words to avoid, and
emoji or hashtag conventions. Pull from the media kit and past captions.

Feed the result into an Agent-mode Connector node that writes the actual draft, passing both the retrieved voice notes and the topic. That Agent-mode node is the AI that runs inside the workflow and produces the caption; it is grounded by the voice notes the Query node returned. This keeps drafting honest to your real materials instead of a generic tone. To build the query-and-draft side faster, scaffold it with Miraxa, the intelligent layer across your automation, then fine-tune the nodes in the properties panel.

Tips

  • Keep the collection's embedding model consistent everywhere. Always embed and query brand-voice with the same model (Gemini Embedding 001 here), or retrieval quality drops.
  • Name files descriptively before you send them (tone-guide.md, media-kit-2026.pdf). The filename becomes the collection entry name and your overwrite key.
  • To refresh a document, re-send it under the exact same filename; the Embed step overwrites the old version in place rather than adding a duplicate.
  • Forward newsletters and caption exports in small batches to stay under the 25 MB per-run limit, and let the additive runs build the library over time.

Common Pitfalls

  • Choosing Transient instead of a persistent collection in the Embed step. Transient collections are auto-cleaned when the run completes, so nothing would survive for later workflows. Pick your named brand-voice collection.
  • Mismatching Document Type to the file. Setting Document Type to PDF for a Markdown file (or vice versa) produces poor or empty chunks. Send one type per batch, or branch on {{ item.contentType }} with a Condition node.
  • Forgetting that an Attachment node requires a Mailhook trigger. The designer will not save the node otherwise, so build the trigger first.
  • Leaving Fail if no attachment matches off for an intake workflow. If a teammate forwards a body-only email, the run quietly does nothing. Turn it on so empty sends surface as failures you can see.
  • Sending to the address without a From allowlist or Subject regex. Because the mailhook fires on any inbound mail, add filters so stray forwards or spam do not pollute your brand-voice collection.

Testing

Validate on a single document before forwarding your whole archive. Email just your tone guide to the generated mailhook address from an address on your From allowlist, then open the run in execution history and confirm the Attachment node returned count: 1 and the Knowledge Embed node reported a chunk count in its embedResult output. Open the brand-voice collection in the Knowledge section and check the document table shows the file with status READY. Finally, add a throwaway Knowledge node in Query mode against the collection with a prompt like "What sign-off does this creator use?" and confirm the answer reflects your guide. Once that round-trips correctly, forward the full set of files in size-safe batches.

Learn More

Did this answer your question? Thanks for the feedback There was a problem submitting your feedback. Please try again later.