How to Build a Searchable Knowledge Base of Lease and Disclosure Templates

Use a Schedule trigger, the ftp connector, a Loop node, and the Knowledge node in Spojit to load your lease, disclosure, and policy documents into a persistent collection, then let agents ask plain-language questions and get cited answers about clauses and required disclosures.

What This Integration Does

Property and leasing teams keep their master templates (residential and commercial lease agreements, mandatory disclosure forms, pet and parking addenda, fair-housing policy memos) as files on an FTP server. Finding the right clause or confirming which disclosure is required for a given state means opening document after document by hand. This tutorial builds two cooperating workflows in Spojit: an indexer that keeps a searchable knowledge collection in sync with the files on FTP, and a question-answering workflow that lets your team (or another workflow) ask things like "What is the notice period for early termination in the standard residential lease?" and get an answer grounded in the actual template text.

The indexer runs on a Schedule trigger. On each run it lists the template directory on FTP with list-directory, loops over every file, downloads each one with download-file, and feeds the bytes into a Knowledge node in Embed mode that writes into a persistent, workspace-scoped collection. Because the collection is persistent, every embedded document stays available across runs and is readable by any other workflow in your workspace. Re-running re-embeds files under the same file name, so an updated template overwrites its previous version rather than creating duplicates. The companion question workflow uses a Webhook trigger plus a Knowledge node in Query mode against that same collection, so answers always reflect the latest indexed templates.

Prerequisites

  • An ftp connection configured under Connections, pointing at the server and folder that holds your lease and disclosure templates (for example /templates/leasing). See the guide on adding a new connection.
  • A persistent Knowledge collection created in the Knowledge section of the sidebar (for example named Leasing Templates). The embedding model is fixed at creation, so pick it now and use the same collection for both workflows. See creating a knowledge collection.
  • Templates stored on FTP in supported formats: PDF, Word, Excel, CSV/TSV, JSON, XML, HTML, Plain Text, Markdown, or RTF.
  • A slack connection if you want index-run summaries posted to a channel (optional).
  • If you plan to call the question workflow from outside Spojit, a signing connection for the Webhook trigger. See setting up a webhook connection.

Step 1: Schedule the indexer workflow

Create a new workflow and add a Trigger node set to type Schedule. Enter a 5-field cron expression and an IANA timezone. To re-index every weekday morning, use 0 6 * * 1-5 with a timezone such as America/New_York. A single Schedule trigger can hold more than one schedule if you want both a morning and an evening pass. The trigger output is { scheduledAt }; you do not need to read it, since the file list comes from FTP. For more on cron fields, see setting up a schedule trigger.

Step 2: List the template directory on FTP

Add a Connector node in Direct mode on the ftp connector and pick the list-directory tool. Set path to the folder that holds your templates, for example /templates/leasing. The tool returns one entry per item with its name, type, size, and modification date. Name the output variable fileList so later steps can read {{ fileList.data.entries }}. If your templates are split across subfolders (residential, commercial, disclosures), either point this step at the parent and filter by type, or duplicate this step per folder and concatenate the lists in a Transform node.

Step 3: Loop over each template file

Add a Loop node in ForEach mode and set its collection to the file entries from the previous step, for example {{ fileList.data.entries }}. Each pass exposes the current entry (call the loop item file), giving you {{ file.name }} and {{ file.type }} inside the loop body. Add a Condition node as the first node in the body to skip anything that is not a file, so directories returned by the listing do not break a download:

{{ file.type }} == "file"

Connect the true branch onward to the download step. For more on iteration, see using loop nodes.

Step 4: Download each template as bytes

Inside the loop body, add a Connector node in Direct mode on the ftp connector with the download-file tool. Build the path from the folder plus the current file name, and set encoding to base64 so binary documents such as PDF and Word files survive intact:

path:     /templates/leasing/{{ file.name }}
encoding: base64

Name the output variable downloaded. The downloaded bytes are available at {{ downloaded.data.content }}, which is exactly what the Knowledge node expects as its document input.

Step 5: Embed each template into the persistent collection

Still inside the loop body, add a Knowledge node in Embed mode. Configure it as follows:

  • Collection: select your persistent collection, for example Leasing Templates (do not pick Transient: you want these documents to persist and be queried later).
  • File Name: set to {{ file.name }}. Embedding under the file name means a re-run of an updated template overwrites the old version instead of creating a duplicate.
  • Document Type: choose the matching type. If your folder mixes formats, drive this from the file extension using a small Transform node, or split the loop so PDFs map to PDF and Word files map to Word.
  • Document Input: {{ downloaded.data.content }}.
  • Output Variable: name it embedResult to capture the chunk count and metadata for logging.

Use the same embedding model that the collection was created with: Spojit fixes it at creation, and embed and query must match for results to be meaningful. See uploading documents to a collection for how embedding behaves.

Step 6: Summarize the index run

After the loop, post a short summary so your team knows the knowledge base is current. Add a Connector node in Direct mode on the slack connector with the send-message tool, set the channel to your operations channel, and template the message text, for example:

Leasing templates re-indexed: {{ fileList.data.entries.length }} files processed at {{ scheduledAt }}.

If you would rather email the summary, use a Send Email node instead: set Recipients to your leasing manager, a templated Subject, and a plain-text Body. Send Email uses Spojit's built-in mail service and needs no connection, but external recipients must be on the org allowlist under Settings > General > Email recipients. See using send email nodes.

Step 7: Build the companion question-answering workflow

Create a second workflow so people and other workflows can ask questions of the collection. Add a Trigger node set to Webhook, verified by a signing connection. The trigger output is the parsed JSON body, so callers post a question like:

{ "question": "What is the early-termination notice period in the standard residential lease?" }

Add a Knowledge node in Query mode pointed at the same persistent Leasing Templates collection. Set Prompt to {{ input.question }}, choose an AI Model for synthesis, and leave Result Count at 5 (raise it for broad policy questions that span several documents). To force a predictable shape, set a Response Schema that returns the answer plus the source file names it was drawn from:

{
  "type": "object",
  "properties": {
    "answer":  { "type": "string" },
    "sources": { "type": "array", "items": { "type": "string" } }
  },
  "required": ["answer", "sources"]
}

Name the output variable kbAnswer. Finish with a Response node that returns {{ kbAnswer }} to the caller, so an agent receives both the cited answer and the source clauses. For query tuning, see querying your knowledge base.

Tips

  • Keep one persistent collection per document family. A single Leasing Templates collection that both workflows share keeps the indexer and the question workflow pointed at the same source of truth.
  • Embedding under {{ file.name }} makes the indexer idempotent: an updated template overwrites its prior version on the next scheduled run, so you never accumulate stale duplicates.
  • The Knowledge node in Query mode answers from the closest-matching chunks. Asking it to "quote the exact clause and name the document" in the prompt produces answers your leasing team can verify against the template.
  • Open the collection in the Knowledge section to watch document status move from PROCESSING to READY and to confirm chunk counts after a run.

Common Pitfalls

  • Binary files corrupted by the wrong encoding. PDFs and Word documents must be downloaded with encoding set to base64; downloading them as utf8 produces unreadable text that embeds poorly.
  • Directories treated as files. list-directory returns folders too. Filter on {{ file.type }} == "file" with a Condition node before downloading, or download-file will fail on a folder path.
  • Mismatched embedding models. The embedding model is fixed when the collection is created, and the Query node must read a collection embedded with that same model. Do not create a fresh collection with a different model partway through.
  • Wrong collection mode. Selecting Transient in the Embed step would discard your templates at the end of the run. The indexer must write to the persistent collection so the question workflow can read it later.

Testing

Before turning on the schedule, point the list-directory step at a folder holding just two or three templates and run the indexer manually with the Run button. Open the collection in the Knowledge section and confirm each document reaches READY with a sensible chunk count. Then run the question workflow with a known answer, for example asking for the notice period in a lease you can read yourself, and check that {{ kbAnswer }} returns the correct clause and lists the right source file. Once both pass on the small folder, widen path to your full template directory and enable the Schedule trigger.

Learn More

Did this answer your question? Thanks for the feedback There was a problem submitting your feedback. Please try again later.