How to Build a MySQL to MongoDB Data Pipeline

Sync data from a MySQL database to MongoDB automatically.

What This Integration Does

You've got a transactional MySQL database that's the source of truth, and a MongoDB cluster that powers a product feature, a search index, or an analytics layer. Manually keeping the two in sync is brittle. This pipeline runs on a schedule, pulls anything that's changed in MySQL since the last run, and upserts it into MongoDB so the document store stays close to real-time.

Each run is incremental and idempotent. The MySQL updated_at column drives the sync window; the MySQL primary key becomes the MongoDB _id. Re-running a window is safe - documents get overwritten rather than duplicated.

Prerequisites

A MySQL connection with read access to the source tables.
An updated_at (or equivalent) column on every table you intend to sync, with an index. Without it, every run becomes a full table scan.
A MongoDB connection with write access to the target database.
A target database (e.g. mysql_mirror) and a collection per source table.

Step 1: Schedule Trigger

Drop a Trigger node and set its type to Schedule. Five minutes is a good starting cadence; tune it based on how fast downstream consumers need fresh data and how much load MySQL can absorb.

Step 2: Transform - Track the Sync Window

Add a Transform node to compute the bounds of this run's sync window:

syncStart - the previous run's timestamp, or 1970-01-01 00:00:00 on the first run for a full backfill.
syncEnd - the current timestamp, computed once so all queries in this run use the same upper bound.

Step 3: Query MySQL for Changed Rows

Add a Connector node pointing at the mysql connector and pick the execute-query tool. Pull only rows that fall in the window:

SELECT *
FROM customers
WHERE updated_at >= '{{ syncStart }}'
  AND updated_at < '{{ syncEnd }}'
ORDER BY updated_at
LIMIT 5000

If the result set hits 5,000, page on updated_at by advancing syncStart to the last row's updated_at and repeating until you get fewer rows back.

Step 4: Transform - Shape Rows into Documents

Add a Transform node to convert each MySQL row into a MongoDB document:

Set _id to the MySQL primary key. This is what makes upserts idempotent.
Convert MySQL DATETIME values to ISO 8601 so MongoDB indexes them as Dates, not strings.
Stamp syncedAt with the current timestamp.
If the source table has joined data (e.g. an address table), nest it into a sub-document - that's the whole point of moving to MongoDB.

Step 5: Loop and Upsert

Wrap the next step in a Loop over the transformed rows. Inside, add a Connector node for the mongodb connector with the update-documents tool:

Database: mysql_mirror
Collection: customers (or whichever table you're mirroring)
Filter: { "_id": "{{ row._id }}" }
Update: { "$set": {{ row }} }
Upsert: true

For high-throughput tables, prefer the insert-documents tool in batch mode with ordered: false when you know the run is for fresh inserts only - it's significantly faster than per-document upserts.

Step 6: Mirror Deletions

The query above never sees deleted MySQL rows. Two options. If your MySQL uses soft deletes, include deleted_at in the SELECT and translate it into a Mongo archived: true field. If MySQL hard-deletes, add a separate pass that lists all _ids in MongoDB, queries MySQL for the same IDs in chunks, and removes anything that no longer exists with the mongodb delete-documents tool.

Tips

Index updated_at - on the MySQL side, this is what makes incremental sync fast. Without the index, each run scans the full table.
Batch the inserts - MongoDB's bulk operations are much faster than per-document writes. Group transformed rows into batches of 500 before calling the connector.
Multiple tables - run them in a Parallel node, one branch per table, so a slow table doesn't block the others.

Common Pitfalls

Clock skew - if the MySQL server and Spojit run with different times, your window misses or double-counts rows. Always use MySQL's NOW() (or pull it via execute-query) for syncEnd rather than a Spojit-side timestamp.
Schema changes - MongoDB is schemaless, but downstream consumers usually aren't. When MySQL adds a column, run a one-shot full backfill so older Mongo docs get the new field too.
Big rows - MongoDB documents are capped at 16MB. If a MySQL row has huge JSON columns or text blobs, chunk them into a separate collection or compress them.

Testing

Set syncStart to one hour ago, run the workflow once, then compare a handful of records in both databases. Spot-check IDs, timestamps, and nested fields. Then turn the schedule on and watch the first few automated runs for row-count anomalies before walking away.