How to Create a Product Catalog Knowledge Base for AI Queries
Make your product catalog searchable by AI so workflows can look up product details intelligently.
What This Integration Does
Product data lives in structured rows: SKU, name, description, price, attributes. That structure is brilliant for ERPs and terrible for the kind of fuzzy questions humans ask, like "what's the cheapest waterproof option that ships from our Sydney warehouse?". This workflow takes your catalog out of its structured form and indexes it into a Knowledge collection so any workflow or AI Agent can answer those questions in plain English.
The pipeline reads catalog rows from your store or a CSV export, transforms each row into a natural-language document with consistent structure, and embeds the lot into a Knowledge collection. A scheduled re-index keeps the collection fresh as prices and stock change. Once indexed, the collection is a first-class tool the AI Agent can call during chat or workflow execution.
Prerequisites
- A catalog source - one of shopify, woocommerce, bigcommerce, or a csv export delivered via ftp or webhook.
- A Knowledge collection (e.g.
product-catalog). - A small mongodb or mysql store to track each product's last-indexed timestamp (for incremental re-indexing).
Step 1: Schedule Trigger
Add a Trigger node and set the sub-type to Schedule. Nightly at 02:00 is the typical cadence; tighten to hourly if prices change throughout the day. The trigger exposes a since variable that downstream steps use to fetch only products updated since the last run.
Step 2: Fetch Products
Add a Connector node for your store. The exact tool depends on the connector:
- Shopify -
list-productswith anupdated_at_minfilter equal to{{ since }}. - WooCommerce -
list-productswith amodified_afterfilter. - BigCommerce -
list-productswith adate_modified:minfilter. - CSV export - ftp
download-filefollowed by csvparseandto-json.
If the catalog is large, page through results in a Loop until you exhaust the result set.
Step 3: Transform - Build a Natural-Language Document per Product
Add a Transform node that converts each structured row into a consistent text document. The text format matters: phrases like "available in", "shipped from", and "compatible with" give semantic search hooks that bare attribute lists do not.
Product: {{ p.title }}
SKU: {{ p.sku }}
Category: {{ p.category }} / {{ p.subcategory }}
Price: {{ p.price }} {{ p.currency }}
Stock: {{ p.inventoryQty }} in warehouse {{ p.warehouse }}
Description:
{{ p.description }}
Attributes:
- Material: {{ p.material }}
- Color: {{ p.color }}
- Size: {{ p.size }}
- Weight: {{ p.weightKg }} kg
- Ships from: {{ p.warehouse }}
- Compatible with: {{ p.compatibility | join: ", " }}
Strip HTML from p.description with a text replace step before embedding - HTML tags pollute vectors.
Step 4: Knowledge Node - Embed Each Product
Add a Knowledge node in embed mode. Wrap it in the same Loop from Step 2 so each product is embedded as its own document. Use the SKU (or product ID, if SKUs are not unique across stores) as sourceId so updates replace cleanly. Add tags for category, warehouse, and availability (in-stock vs out-of-stock) so queries can scope to those facets.
Step 5: Record the Indexed Timestamp
After each successful embed, write { sku, indexedAt, version } to your tracking store via mongodb update-documents (with upsert) or mysql insert-rows. The next run uses the most recent indexedAt as the high-water mark, making the scheduled refresh truly incremental.
Step 6: Use the Catalog from Other Workflows
Any workflow can now use the catalog. Two common patterns:
- Chat-style lookup - an AI Agent step with the catalog Knowledge collection wired in as a retrieval tool. Sales reps can ask "what's the cheapest outdoor-rated option under $500?" and get a real answer with SKUs.
- Mid-flow enrichment - inside an order-processing flow, a Knowledge query like "shipping restrictions for {{ order.sku }}" pulls the relevant note from the product document and feeds it into the next step.
Tips
- One product per document - resist the temptation to batch products into a single embedded chunk. Retrieval quality is far better when each SKU has its own vector.
- Embed numeric attributes as words - "weighs 2.4 kilograms" retrieves better than "weight: 2.4". The Transform step is the right place to humanize numbers.
- Re-index on stock changes if availability matters - if you tag products with
in-stock/out-of-stock, run a lightweight refresh whenever inventory crosses zero.
Common Pitfalls
- Currency mismatch - if your store sells in multiple currencies, encode the currency in the document text and tag, not just the numeric price. Otherwise a question about "products under 100" matches dollars and yen indiscriminately.
- Variants - a Shopify product with five size variants is five different things to a buyer. Embed one document per variant (with the parent product ID as a tag) so retrieval picks the right one.
- Deleted products - when a SKU is removed in the store, your indexing loop won't see it. Add a periodic reconciliation pass that fetches active SKUs and deletes Knowledge documents whose
sourceIdno longer appears.
Testing
Index a slice of 20 products covering different categories and price points. Open a chat or workflow with the AI Agent and ask three real product questions ("cheapest waterproof", "what's compatible with model X", "anything under $50 in stock"). Confirm the answers cite plausible SKUs from your slice. Then update one product's price in the source store, run the workflow, and confirm the Knowledge document reflects the new price.