How to Build a Data Warehouse Sync from Multiple E-commerce Stores
Aggregate data from all your e-commerce platforms into a central database.
What This Integration Does
If you sell on Shopify, WooCommerce, and BigCommerce simultaneously - or are mid-migration between platforms - your analytics team is fighting three different APIs, three different schemas, and three different "what counts as a customer" definitions. This workflow consolidates all of it: it fans out to each platform in parallel, normalizes the records into a single schema, and lands them in one MySQL or MongoDB warehouse where every dashboard query joins seamlessly across stores.
The pipeline runs on a schedule, scoped to records updated since the last run, with each platform handled by its own branch of a Parallel node. A platform outage on one branch doesn't block the others, and each branch is its own retry boundary.
Prerequisites
- Connections to every store: Shopify, WooCommerce, BigCommerce, with read scopes for orders, customers, and products.
- A target database: MySQL (recommended if your BI tool prefers SQL) or MongoDB (if your record shapes diverge a lot per platform).
- A unified schema decided up-front. At minimum:
warehouse_id(composite of platform + native ID),source_platform,source_id,order_total,currency,customer_email,placed_at. - An indexed natural key (
warehouse_id) so upserts stay fast.
Step 1: Schedule Trigger
Drop a Trigger node and set it to Schedule. Nightly at 02:00 local time is the conventional cadence for warehouse sync; if dashboards need fresher data, 15 to 30 minutes is fine. The trigger exposes the prior run timestamp.
Step 2: Variable - Compute the Sync Window
Compute syncStart from the previous run timestamp (or 30 days ago on first run for an initial backfill) and syncEnd as now(). Every branch in the next step uses the same window, which is critical: if branches compute their own now(), runs drift and rows fall through cracks at the seam.
Step 3: Parallel Fetch from Each Store
Add a Parallel node with one branch per platform.
- Shopify branch: shopify connector with
list-orders, filtered byupdated_at_min={{ syncStart }}andupdated_at_max={{ syncEnd }}. - WooCommerce branch: woocommerce connector with
list-orders, filtered by the same modified-after window. - BigCommerce branch: bigcommerce connector with
list-orders, filtered bymin_date_modifiedandmax_date_modified.
Each branch should handle its own pagination using a Loop set to While on the "last page was full" condition.
Step 4: Normalize - Map to the Unified Schema
After each branch, add a Transform node that maps the platform's native shape into the warehouse schema. The trick is doing the mapping per branch (so each branch knows its own quirks) and emitting the same shape. Examples:
warehouse_id="shopify:" + order.idsource_platform= the platform stringcustomer_email=order.email(Shopify) ororder.billing.email(Woo) ororder.billing_address.email(BC)order_total= always parsed to a Decimal in the store currencyplaced_at= always ISO 8601 UTC
For currency, capture the native currency on each row rather than converting in the pipeline - downstream FX conversion is the BI tool's job and rates change.
Step 5: Merge and Load
After the Parallel node closes, concatenate the three result arrays into one (use the array connector's flatten). Then bulk-upsert into the warehouse.
- MySQL: mysql connector with
insert-rowsusingON DUPLICATE KEY UPDATEonwarehouse_id. - MongoDB: mongodb connector with
update-documents, filter{ warehouse_id },$setthe row,upsert: true.
Step 6: Health Check and Alerting
Add a final Condition node that compares the per-branch row counts to a 7-day rolling average. If any branch returns 90% fewer rows than usual, route to a slack send-message alert - that's almost always a sign one of the API connections has a stale token or the platform is down, and your warehouse is silently growing stale.
Tips
- Build separate workflows per entity - orders, customers, and products each have very different volumes and shape complexity. Don't try to do all three in one workflow.
- Currency and timezones are the hard part - decide once, document it, and assert it in the Transform step. Future-you will thank present-you.
- Composite warehouse IDs save your life - a Shopify order and a BigCommerce order can both have ID
1001. Always prefix with the platform.
Common Pitfalls
- Pagination differs per platform - Shopify uses cursor pagination, WooCommerce uses page numbers, BigCommerce uses both depending on the resource. Don't share a loop across branches; let each branch handle its own.
- Soft deletes - cancelled/refunded orders update on each platform with different field names. Capture
statusin the unified schema rather than filtering them out at load time, so analysts can see the full picture. - Schema drift - when one platform releases a new field your team wants, only that branch's Transform changes - which means your warehouse rows get inconsistent. Decide whether the new field is on every platform's row (with nulls) or sits in a platform-specific sidecar table.
Testing
Run the workflow manually with a tight 30-minute window. Confirm each branch returns roughly the right row count (cross-check with each platform's admin UI). Spot-check a few normalized rows in the warehouse and make sure warehouse_id, customer_email, and order_total look identical regardless of source. Once happy, switch the trigger on and watch the next 3 runs.