How to Sync Shopify Data to MySQL for Reporting

Store Shopify order and product data in MySQL for custom reporting and analytics.

What This Integration Does

Shopify's built-in reports cover the basics but stop short when you want to join orders against your marketing spend, run cohort analysis, or feed a custom dashboard. The fix is a flat, indexed copy of the data in MySQL where BI tools, SQL analysts, and ad-hoc queries can roam freely without rate-limiting Shopify's API.

The workflow runs on a schedule, pulls only the records updated inside a recent time window, normalizes them to a clean table schema, and uses upserts so re-running is always safe. Each pass is incremental, so the load on both Shopify and MySQL stays bounded. In Spojit you build the whole thing on the visual canvas with no custom sync service to maintain.

Prerequisites

A Shopify connection with read scopes (read_orders, read_products, read_customers).
A MySQL connection with write access to the reporting database.
Target tables created up-front (e.g. shopify_orders, shopify_products) with a unique index on the Shopify ID column and a timestamp column to drive incremental refresh.
An expected timezone for stored timestamps. UTC is recommended.

Step 1: Schedule Trigger

Drop a Trigger node and set its type to Schedule. The schedule uses a 5-field cron expression plus an IANA timezone, for example 0 * * * * with Australia/Sydney for hourly; every 15 minutes (*/15 * * * *) works for higher-volume stores. The trigger output is {{ scheduledAt }}, which the next step uses as the upper bound of the sync window.

Step 2: Transform - Compute the Sync Window

Add a Transform node that emits two values:

syncEnd - the trigger's {{ scheduledAt }} value, the upper bound of the window.
syncStart - syncEnd minus your cadence (for example one hour), or 30 days ago on the first run for an initial backfill.

Both values are passed to Shopify's updated_at_min / updated_at_max filters in the next step. The date connector's subtract and format tools are handy for building these timestamps. Because each window slightly overlaps the last, the upserts in Step 5 keep re-processing safe.

Step 3: Fetch Recently Updated Orders

Add a Connector node for the shopify connector and pick the list-orders tool. Configure:

updated_at_min: {{ syncStart }}
updated_at_max: {{ syncEnd }}
status: any (so cancelled and closed orders are included)
limit: 250 (Shopify's per-page max)

Wrap the call in a Loop set to While on Shopify's cursor pagination - the response contains a next_page cursor when more pages exist. Collect all pages into a single allOrders array.

Step 4: Transform - Flatten for SQL

Shopify returns deeply nested JSON; MySQL wants flat columns. Add a Transform node that emits one row per order with the fields you actually report on:

shopify_id = order.id
order_number, email, currency, total_price, subtotal_price
customer_id = order.customer?.id
placed_at = ISO 8601 UTC parsed from order.created_at
updated_at = ISO 8601 UTC parsed from order.updated_at
line_items_json = the full line items array serialized as a JSON column for ad-hoc queries

For complex shapes, use the json connector's pick and flatten tools or the code connector's execute-javascript for finer control.

Step 5: Upsert into MySQL

Add a Connector node for the mysql connector with the insert-rows tool. Pass the transformed rows in a single batch and set the on-duplicate behavior to update - that's what makes re-running a window safe:

INSERT INTO shopify_orders (...)
VALUES (...)
ON DUPLICATE KEY UPDATE
  order_number = VALUES(order_number),
  total_price = VALUES(total_price),
  updated_at = VALUES(updated_at),
  line_items_json = VALUES(line_items_json)

The unique index on shopify_id is what triggers the upsert path. Insert in batches of 500 to 1000 rows for performance.

Step 6: Repeat for Products and Customers

Fan out with a Parallel node so orders, products, and customers refresh in parallel. Each branch repeats Steps 3 to 5 with the matching tool (list-products, list-customers). Each branch handles its own pagination and lands in its own MySQL table. After the parallel block, send a single slack send-message heartbeat with the row counts so you have a quick "did the sync run?" signal.

Tips

Index placed_at - the column reports filter on most often. shopify_id is already indexed via the unique constraint.
Store the raw JSON - keep one raw_json column with the full Shopify payload. When you realize later you need a field you didn't pull out, it's already there and you can backfill with a single SQL update.
Skip the loop for batches - insert-rows can take an array directly; you don't need a per-row loop. One bulk call is much faster.

Common Pitfalls

API rate limits - Shopify uses a leaky bucket. If you fan out too aggressively in Step 6, you'll hit 429s. Spojit's per-node retry handles them but a tight cap (concurrency 2 or 3) is safer.
Currency confusion - total_price is in the store's primary currency, not necessarily the order's presentment currency. Store both currency and presentment_currency if your store sells in multiple regions.
Cancelled / refunded orders - Shopify keeps these in the orders list but with status changes. Capture cancelled_at and refund_total rather than filtering cancelled orders out, or your revenue numbers will be wrong.
Initial backfill is huge - a 5-year backfill can take hours and burn through API credit. Start with the last 30 days and extend incrementally.

Testing

Run the workflow with syncStart set to one hour ago. Spot-check three orders in MySQL: do shopify_id, total_price, and timestamps match what Shopify admin shows? Force a re-run on the same window and confirm row counts stay the same (the upsert path is working). Then turn the schedule on.