How to Create a Product Catalog Knowledge Base for AI Queries

Make your product catalog searchable by AI so workflows can look up product details intelligently.

What This Integration Does

Product data lives in structured rows: SKU, name, description, price, attributes. That structure is brilliant for stores and terrible for the kind of fuzzy questions humans ask, like "what's the cheapest waterproof option that ships from our Sydney warehouse?". This Spojit workflow takes your catalog out of its structured form and embeds it into a Knowledge collection so any workflow can query it in plain English.

The pipeline reads catalog rows from your store or a CSV export, transforms each row into a natural-language document with consistent structure, and embeds the lot into a persistent Knowledge collection using the Knowledge node in embed mode. A scheduled re-index keeps the collection fresh as prices and stock change. Once embedded, the collection is workspace-scoped, so any workflow in the workspace can run a Knowledge query node against it during execution.

Prerequisites

A catalog source - one of shopify, woocommerce, bigcommerce, or a csv export delivered via ftp or webhook.
A Knowledge collection (e.g. product-catalog).
A small mongodb or mysql store to track each product's last-indexed timestamp (for incremental re-indexing).

Step 1: Schedule Trigger

Add a Trigger node and set the trigger type to Schedule. Use a 5-field cron expression with an IANA timezone, for example 0 2 * * * with Australia/Sydney for nightly at 02:00; tighten to hourly if prices change throughout the day. The schedule output is { scheduledAt }. To fetch only products changed since the last run, read the high-water mark from your tracking store (Step 5) at the top of the workflow rather than from the trigger.

Step 2: Fetch Products

Add a Connector node for your store. The exact tool depends on the connector:

Shopify - list-products with an updated_at_min filter equal to your last high-water mark, for example {{ lastIndexed.indexedAt }}.
WooCommerce - list-products with a modified_after filter.
BigCommerce - list-products with a date_modified:min filter.
CSV export - ftp download-file followed by csv parse and to-json.

If the catalog is large, page through results in a Loop until you exhaust the result set.

Step 3: Transform - Build a Natural-Language Document per Product

Add a Transform node that converts each structured row into a consistent text document. The text format matters: phrases like "available in", "shipped from", and "compatible with" give semantic search hooks that bare attribute lists do not.

Product: {{ p.title }}
SKU: {{ p.sku }}
Category: {{ p.category }} / {{ p.subcategory }}
Price: {{ p.price }} {{ p.currency }}
Stock: {{ p.inventoryQty }} in warehouse {{ p.warehouse }}

Description:
{{ p.description }}

Attributes:
- Material: {{ p.material }}
- Color: {{ p.color }}
- Size: {{ p.size }}
- Weight: {{ p.weightKg }} kg
- Ships from: {{ p.warehouse }}
- Compatible with: {{ p.compatibility | join: ", " }}

Strip HTML from p.description with a text replace step before embedding - HTML tags pollute vectors.

Step 4: Knowledge Node - Embed Each Product

Add a Knowledge node and set its mode to Embed. Wrap it in the same Loop from Step 2 so each product is embedded as its own document. Set Collection to your persistent product-catalog collection. Set File Name to the product SKU (or product ID, if SKUs are not unique across stores), for example {{ p.sku }}.txt: embed overwrites any existing document with the same file name, so re-running cleanly replaces the prior version. Set Document Type to Plain Text and point Document Input at the base64 content of your transformed document. Choose an Embedding Model at the collection level and use the same one consistently. Capture the chunk count and metadata in the Output Variable.

The Document Input field expects a base64 reference. Encode the text from Step 3 with the encoding connector's base64-encode tool, then feed that result into Document Input.

Step 5: Record the Indexed Timestamp

After each successful embed, write { sku, indexedAt, version } to your tracking store via mongodb update-documents (with upsert) or mysql insert-rows. The next run uses the most recent indexedAt as the high-water mark, making the scheduled refresh truly incremental.

Step 6: Use the Catalog from Other Workflows

Any workflow in the workspace can now query the catalog. Two common patterns:

Natural-language lookup - a Knowledge node in Query mode pointed at the product-catalog collection. Set Prompt to a question like "what's the cheapest outdoor-rated option under $500?", set Result Count, and optionally add a Response Schema to force structured JSON (for example a list of SKUs and prices) into the Output Variable.
Mid-flow enrichment - inside an order-processing flow, a Knowledge query with a Prompt like shipping restrictions for {{ order.sku }} pulls the relevant note from the product document and feeds it into the next step.

Tips

One product per document - resist the temptation to batch products into a single embedded chunk. Retrieval quality is far better when each SKU has its own vector.
Embed numeric attributes as words - "weighs 2.4 kilograms" retrieves better than "weight: 2.4". The Transform step is the right place to humanize numbers.
Re-index on stock changes if availability matters - write the availability state ("in stock" / "out of stock") into the document text so queries can pick it up, and run a lightweight refresh whenever inventory crosses zero.

Common Pitfalls

Currency mismatch - if your store sells in multiple currencies, write the currency into the document text, not just the numeric price. Otherwise a question about "products under 100" matches dollars and yen indiscriminately.
Variants - a Shopify product with five size variants is five different things to a buyer. Embed one document per variant, giving each a distinct File Name (for example {{ p.sku }}-{{ variant.id }}.txt) so each variant is retrieved on its own.
Deleted products - when a SKU is removed in the store, your indexing loop won't see it. Add a periodic reconciliation pass that fetches active SKUs and, in the Knowledge section of the sidebar, deletes the documents whose file names no longer match a live SKU.

Testing

Index a slice of 20 products covering different categories and price points. Open a chat or workflow with the AI Agent and ask three real product questions ("cheapest waterproof", "what's compatible with model X", "anything under $50 in stock"). Confirm the answers cite plausible SKUs from your slice. Then update one product's price in the source store, run the workflow, and confirm the Knowledge document reflects the new price.