How to Sync Shopify Data to MySQL for Reporting

Store Shopify order and product data in MySQL for custom reporting and analytics.

What This Integration Does

Shopify's built-in reports cover the basics but stop short when you want to join orders against your marketing spend, run cohort analysis, or feed a custom dashboard. The fix is a flat, indexed copy of the data in MySQL where BI tools, SQL analysts, and ad-hoc queries can roam freely without rate-limiting Shopify's API.

The workflow runs on a schedule, pulls only what's changed since the last run, normalizes the records to a clean table schema, and uses upserts so re-running is always safe. Each pass is incremental, so the load on both Shopify and MySQL stays bounded.

Prerequisites

  • A Shopify connection with read scopes (read_orders, read_products, read_customers).
  • A MySQL connection with write access to the reporting database.
  • Target tables created up-front (e.g. shopify_orders, shopify_products) with a unique index on the Shopify ID column and a timestamp column to drive incremental refresh.
  • An expected timezone for stored timestamps. UTC is recommended.

Step 1: Schedule Trigger

Drop a Trigger node and set it to Schedule. Hourly is a common cadence; every 15 minutes works for higher-volume stores. The trigger exposes the previous run timestamp which the next step turns into a sync window.

Step 2: Variable - Compute the Sync Window

Add a Variable assignment:

  • syncStart - previous run timestamp, or 30 days ago on the first run for an initial backfill.
  • syncEnd - the current timestamp.

Both values are passed to Shopify's updated_at_min / updated_at_max filters in the next step.

Step 3: Fetch Recently Updated Orders

Add a Connector node for the shopify connector and pick the list-orders tool. Configure:

  • updated_at_min: {{ syncStart }}
  • updated_at_max: {{ syncEnd }}
  • status: any (so cancelled and closed orders are included)
  • limit: 250 (Shopify's per-page max)

Wrap the call in a Loop set to While on Shopify's cursor pagination - the response contains a next_page cursor when more pages exist. Collect all pages into a single allOrders array.

Step 4: Transform - Flatten for SQL

Shopify returns deeply nested JSON; MySQL wants flat columns. Add a Transform node that emits one row per order with the fields you actually report on:

  • shopify_id = order.id
  • order_number, email, currency, total_price, subtotal_price
  • customer_id = order.customer?.id
  • placed_at = ISO 8601 UTC parsed from order.created_at
  • updated_at = ISO 8601 UTC parsed from order.updated_at
  • line_items_json = the full line items array serialized as a JSON column for ad-hoc queries

For complex shapes, use the json connector's pick and flatten tools or the code connector's execute-javascript for finer control.

Step 5: Upsert into MySQL

Add a Connector node for the mysql connector with the insert-rows tool. Pass the transformed rows in a single batch and set the on-duplicate behavior to update - that's what makes re-running a window safe:

INSERT INTO shopify_orders (...)
VALUES (...)
ON DUPLICATE KEY UPDATE
  order_number = VALUES(order_number),
  total_price = VALUES(total_price),
  updated_at = VALUES(updated_at),
  line_items_json = VALUES(line_items_json)

The unique index on shopify_id is what triggers the upsert path. Insert in batches of 500 to 1000 rows for performance.

Step 6: Repeat for Products and Customers

Fan out with a Parallel node so orders, products, and customers refresh in parallel. Each branch repeats Steps 3 to 5 with the matching tool (list-products, list-customers). Each branch handles its own pagination and lands in its own MySQL table. After the parallel block, send a single slack send-message heartbeat with the row counts so you have a quick "did the sync run?" signal.

Tips

  • Index placed_at - the column reports filter on most often. shopify_id is already indexed via the unique constraint.
  • Store the raw JSON - keep one raw_json column with the full Shopify payload. When you realize later you need a field you didn't pull out, it's already there and you can backfill with a single SQL update.
  • Skip the loop for batches - insert-rows can take an array directly; you don't need a per-row loop. One bulk call is much faster.

Common Pitfalls

  • API rate limits - Shopify uses a leaky bucket. If you fan out too aggressively in Step 6, you'll hit 429s. Spojit's per-node retry handles them but a tight cap (concurrency 2 or 3) is safer.
  • Currency confusion - total_price is in the store's primary currency, not necessarily the order's presentment currency. Store both currency and presentment_currency if your store sells in multiple regions.
  • Cancelled / refunded orders - Shopify keeps these in the orders list but with status changes. Capture cancelled_at and refund_total rather than filtering cancelled orders out, or your revenue numbers will be wrong.
  • Initial backfill is huge - a 5-year backfill can take hours and burn through API credit. Start with the last 30 days and extend incrementally.

Testing

Run the workflow with syncStart set to one hour ago. Spot-check three orders in MySQL: do shopify_id, total_price, and timestamps match what Shopify admin shows? Force a re-run on the same window and confirm row counts stay the same (the upsert path is working). Then turn the schedule on.

Learn More

Did this answer your question? Thanks for the feedback There was a problem submitting your feedback. Please try again later.