How to Sync Shopify Data to MySQL for Reporting
Store Shopify order and product data in MySQL for custom reporting and analytics.
What This Integration Does
Shopify's built-in reports cover the basics but stop short when you want to join orders against your marketing spend, run cohort analysis, or feed a custom dashboard. The fix is a flat, indexed copy of the data in MySQL where BI tools, SQL analysts, and ad-hoc queries can roam freely without rate-limiting Shopify's API.
The workflow runs on a schedule, pulls only what's changed since the last run, normalizes the records to a clean table schema, and uses upserts so re-running is always safe. Each pass is incremental, so the load on both Shopify and MySQL stays bounded.
Prerequisites
- A Shopify connection with read scopes (
read_orders,read_products,read_customers). - A MySQL connection with write access to the reporting database.
- Target tables created up-front (e.g.
shopify_orders,shopify_products) with a unique index on the Shopify ID column and a timestamp column to drive incremental refresh. - An expected timezone for stored timestamps. UTC is recommended.
Step 1: Schedule Trigger
Drop a Trigger node and set it to Schedule. Hourly is a common cadence; every 15 minutes works for higher-volume stores. The trigger exposes the previous run timestamp which the next step turns into a sync window.
Step 2: Variable - Compute the Sync Window
Add a Variable assignment:
syncStart- previous run timestamp, or 30 days ago on the first run for an initial backfill.syncEnd- the current timestamp.
Both values are passed to Shopify's updated_at_min / updated_at_max filters in the next step.
Step 3: Fetch Recently Updated Orders
Add a Connector node for the shopify connector and pick the list-orders tool. Configure:
- updated_at_min:
{{ syncStart }} - updated_at_max:
{{ syncEnd }} - status:
any(so cancelled and closed orders are included) - limit:
250(Shopify's per-page max)
Wrap the call in a Loop set to While on Shopify's cursor pagination - the response contains a next_page cursor when more pages exist. Collect all pages into a single allOrders array.
Step 4: Transform - Flatten for SQL
Shopify returns deeply nested JSON; MySQL wants flat columns. Add a Transform node that emits one row per order with the fields you actually report on:
shopify_id=order.idorder_number,email,currency,total_price,subtotal_pricecustomer_id=order.customer?.idplaced_at= ISO 8601 UTC parsed fromorder.created_atupdated_at= ISO 8601 UTC parsed fromorder.updated_atline_items_json= the full line items array serialized as a JSON column for ad-hoc queries
For complex shapes, use the json connector's pick and flatten tools or the code connector's execute-javascript for finer control.
Step 5: Upsert into MySQL
Add a Connector node for the mysql connector with the insert-rows tool. Pass the transformed rows in a single batch and set the on-duplicate behavior to update - that's what makes re-running a window safe:
INSERT INTO shopify_orders (...)
VALUES (...)
ON DUPLICATE KEY UPDATE
order_number = VALUES(order_number),
total_price = VALUES(total_price),
updated_at = VALUES(updated_at),
line_items_json = VALUES(line_items_json)
The unique index on shopify_id is what triggers the upsert path. Insert in batches of 500 to 1000 rows for performance.
Step 6: Repeat for Products and Customers
Fan out with a Parallel node so orders, products, and customers refresh in parallel. Each branch repeats Steps 3 to 5 with the matching tool (list-products, list-customers). Each branch handles its own pagination and lands in its own MySQL table. After the parallel block, send a single slack send-message heartbeat with the row counts so you have a quick "did the sync run?" signal.
Tips
- Index
placed_at- the column reports filter on most often.shopify_idis already indexed via the unique constraint. - Store the raw JSON - keep one
raw_jsoncolumn with the full Shopify payload. When you realize later you need a field you didn't pull out, it's already there and you can backfill with a single SQL update. - Skip the loop for batches -
insert-rowscan take an array directly; you don't need a per-row loop. One bulk call is much faster.
Common Pitfalls
- API rate limits - Shopify uses a leaky bucket. If you fan out too aggressively in Step 6, you'll hit 429s. Spojit's per-node retry handles them but a tight cap (concurrency 2 or 3) is safer.
- Currency confusion -
total_priceis in the store's primary currency, not necessarily the order's presentment currency. Store bothcurrencyandpresentment_currencyif your store sells in multiple regions. - Cancelled / refunded orders - Shopify keeps these in the orders list but with status changes. Capture
cancelled_atandrefund_totalrather than filtering cancelled orders out, or your revenue numbers will be wrong. - Initial backfill is huge - a 5-year backfill can take hours and burn through API credit. Start with the last 30 days and extend incrementally.
Testing
Run the workflow with syncStart set to one hour ago. Spot-check three orders in MySQL: do shopify_id, total_price, and timestamps match what Shopify admin shows? Force a re-run on the same window and confirm row counts stay the same (the upsert path is working). Then turn the schedule on.