How to Build a Data Warehouse Sync from Multiple E-commerce Stores

Aggregate data from all your e-commerce platforms into a central database.

What This Integration Does

If you sell on Shopify, WooCommerce, and BigCommerce simultaneously - or are mid-migration between platforms - your analytics team is fighting three different APIs, three different schemas, and three different "what counts as a customer" definitions. This workflow consolidates all of it: it fans out to each platform in parallel, normalizes the records into a single schema, and lands them in one MySQL or MongoDB warehouse where every dashboard query joins seamlessly across stores.

The pipeline runs on a schedule, scoped to records updated since the last run, with each platform handled by its own branch of a Parallel node. A platform outage on one branch doesn't block the others, and each branch is its own retry boundary.

Prerequisites

Connections to every store: Shopify, WooCommerce, BigCommerce, with read scopes for orders, customers, and products.
A target database: MySQL (recommended if your BI tool prefers SQL) or MongoDB (if your record shapes diverge a lot per platform).
A unified schema decided up-front. At minimum: warehouse_id (composite of platform + native ID), source_platform, source_id, order_total, currency, customer_email, placed_at.
An indexed natural key (warehouse_id) so upserts stay fast.

Step 1: Schedule Trigger

Drop a Trigger node and set it to Schedule. Use a 5-field cron expression with an IANA timezone: nightly at 02:00 is the conventional cadence for warehouse sync (0 2 * * * with Australia/Sydney); if dashboards need fresher data, run every 15 to 30 minutes instead. The trigger output is {{ scheduledAt }}, the time Spojit fired the run.

Step 2: Transform - Compute the Sync Window

Add a Transform node that computes syncEnd as the current time and syncStart as a fixed look-back from it (for example, your cadence plus a small overlap buffer, or 30 days back on the first backfill). Use the date connector's subtract and now tools to build the window. Every branch in the next step reuses the same syncStart and syncEnd, which is critical: if branches each compute their own current time, runs drift and rows fall through cracks at the seam. A small overlap on each run means upserts re-touch a few rows harmlessly rather than skipping any.

Step 3: Parallel Fetch from Each Store

Add a Parallel node with one branch per platform.

Shopify branch: shopify connector with list-orders, filtered by updated_at_min={{ syncStart }} and updated_at_max={{ syncEnd }}.
WooCommerce branch: woocommerce connector with list-orders, filtered by the same modified-after window.
BigCommerce branch: bigcommerce connector with list-orders, filtered by min_date_modified and max_date_modified.

Each branch should handle its own pagination using a Loop set to While on the "last page was full" condition.

Step 4: Normalize - Map to the Unified Schema

After each branch, add a Transform node that maps the platform's native shape into the warehouse schema. The trick is doing the mapping per branch (so each branch knows its own quirks) and emitting the same shape. Examples:

warehouse_id = "shopify:" + order.id
source_platform = the platform string
customer_email = order.email (Shopify) or order.billing.email (Woo) or order.billing_address.email (BC)
order_total = always parsed to a Decimal in the store currency
placed_at = always ISO 8601 UTC

For currency, capture the native currency on each row rather than converting in the pipeline - downstream FX conversion is the BI tool's job and rates change.

Step 5: Merge and Load

After the Parallel node closes, concatenate the three result arrays into one (use the array connector's flatten). Then bulk-upsert into the warehouse.

MySQL: mysql connector with insert-rows using ON DUPLICATE KEY UPDATE on warehouse_id.
MongoDB: mongodb connector with update-documents, filter { warehouse_id }, $set the row, upsert: true.

Step 6: Health Check and Alerting

Add a final Condition node that compares the per-branch row counts to a 7-day rolling average. If any branch returns 90% fewer rows than usual, route to a slack send-message alert - that's almost always a sign one of the API connections has a stale token or the platform is down, and your warehouse is silently growing stale.

Tips

Build separate workflows per entity - orders, customers, and products each have very different volumes and shape complexity. Don't try to do all three in one workflow.
Currency and timezones are the hard part - decide once, document it, and assert it in the Transform step. Future-you will thank present-you.
Composite warehouse IDs save your life - a Shopify order and a BigCommerce order can both have ID 1001. Always prefix with the platform.

Common Pitfalls

Pagination differs per platform - Shopify uses cursor pagination, WooCommerce uses page numbers, BigCommerce uses both depending on the resource. Don't share a loop across branches; let each branch handle its own.
Soft deletes - cancelled/refunded orders update on each platform with different field names. Capture status in the unified schema rather than filtering them out at load time, so analysts can see the full picture.
Schema drift - when one platform releases a new field your team wants, only that branch's Transform changes - which means your warehouse rows get inconsistent. Decide whether the new field is on every platform's row (with nulls) or sits in a platform-specific sidecar table.

Testing

Run the workflow manually with a tight 30-minute window. Confirm each branch returns roughly the right row count (cross-check with each platform's admin UI). Spot-check a few normalized rows in the warehouse and make sure warehouse_id, customer_email, and order_total look identical regardless of source. Once happy, switch the trigger on and watch the next 3 runs.