How to Auto-Index Incoming Emails into Your Knowledge Base

Automatically add important emails and attachments to your searchable knowledge base.

What This Integration Does

Important context lives in email: vendor notices, support escalations, contract amendments, policy updates from HR. None of that is searchable once it slips below the fold of someone's inbox. This Spojit workflow watches your shared inbox, picks out the messages that actually matter, and pushes both the body and any attachments into your Knowledge base so the whole company can query them later.

The workflow runs on a short schedule, pulls recent conversations from a shared inbox via the front connector, filters them by tag or sender, and embeds each one as a Knowledge document. Attachments get extracted and indexed separately so a PDF contract attached to an email becomes a first-class searchable artifact in its own right. Each indexed message records its source ID so re-runs are idempotent.

Prerequisites

A Front connection with read access to the shared inbox you want to mirror, plus permission to list tags and contacts.
A Knowledge collection set up to hold inbox content (e.g. support-inbox-archive).
A small persistence store - a MongoDB collection or mysql table - to track which conversation IDs have already been indexed.

Step 1: Schedule Trigger

Add a Trigger node and set the type to Schedule. Use a 5-field cron expression such as */10 * * * * with an IANA timezone: every 10-15 minutes is a good cadence for shared inboxes, frequent enough that nothing sits unindexed for long, infrequent enough that you stay well inside Front's rate limits. The schedule trigger outputs {{ scheduledAt }}; derive a since cutoff from the most recent timestamp recorded in your tracking store (Step 6) so each run only pulls conversations updated since the last successful pass.

Step 2: Connector - List Recent Conversations

Add a Connector node pointing at the front connector and pick the list-conversations tool. Filter for conversations updated since the last run and, optionally, only those carrying a specific tag such as archive-to-kb:

{
  "q": "updated_at:>{{ since }} AND tag:archive-to-kb",
  "limit": 100
}

This returns conversation metadata including the body, participants, and a list of attachment URLs.

Step 3: Loop and Skip Already-Indexed Messages

Wrap the next steps in a Loop over the conversation list. For each conversation, run a Connector step against your tracking store (for example a mongodb find-documents with { "conversationId": "{{ conv.id }}" }). Follow it with a Condition node - if a record exists, short-circuit the iteration. Otherwise continue.

Step 4: Embed the Email Body

Add a Knowledge node in Embed mode and point its Collection at your persistent inbox archive. Build a small text document that combines the subject, participants, and body so the indexed chunk has the context it needs to be retrieved later, then feed it into the Document Input field:

Subject: {{ conv.subject }}
From: {{ conv.from }}
Date: {{ conv.created_at }}

{{ conv.body_text }}

Set Document Type to Plain Text and set the File Name to the Front conversation ID (for example {{ conv.id }}). Because Embed mode overwrites any existing document with the same file name, re-indexing a conversation replaces its document rather than duplicating it.

Step 5: Process Attachments

Add a nested Loop over conv.attachments. For each attachment, branch on file type with a Condition:

For PDFs, call the pdf connector's extract-text tool, then pass the result into the Document Input of a Knowledge embed step.
For CSVs, call csv parse followed by csv to-json, then embed a summary plus a sample of rows.
For JSON, use json prettify before embedding.

Give each attachment document a File Name that carries the parent conversation ID (for example {{ conv.id }}-{{ attachment.name }}) so a Knowledge query can show "this answer came from an attachment on conversation X".

Step 6: Record the Conversation as Indexed

After a successful embed, write the conversation ID and timestamp to the tracking store via a mongodb insert-documents or mysql insert-rows call. That's what makes the workflow idempotent - the next scheduled run will skip anything already in the tracking table.

Tips

Use tags as a soft filter - rather than indexing everything, train the team to tag conversations archive-to-kb. You end up with a tighter, more useful index.
Chunk long threads - if a thread has 50 replies, embed each comment as its own document rather than one giant blob; retrieval quality is much better.
Watch attachment size - cap PDF text extraction at 5-10 MB to avoid feeding 500-page contracts through the whole pipeline.

Common Pitfalls

Private content - shared inboxes contain customer PII. Strip emails and phone numbers (for example with the regex connector's replace tool) before the text reaches the Knowledge embed step, since anyone in the workspace can query the collection.
Pagination - list-conversations caps at 100 per page. Loop until the result page is empty or you'll miss high-volume periods.
Timezone drift on since - Front timestamps are UTC. If you compute since in local time you will lose or duplicate records every DST change.

Testing

Hand-tag two or three conversations with archive-to-kb. Run the workflow manually. Open the Knowledge collection and confirm one document per conversation plus one per attachment, with sensible chunking and each document's File Name carrying the conversation ID. Re-run and confirm no duplicate documents are written (the tracking store should skip already-indexed conversations, and matching file names overwrite in place). Only then enable the schedule.