How to Build a MySQL to MongoDB Data Pipeline
Sync data from a MySQL database to MongoDB automatically.
What This Integration Does
You've got a transactional MySQL database that's the source of truth, and a MongoDB cluster that powers a product feature, a search index, or an analytics layer. Manually keeping the two in sync is brittle. This pipeline runs on a schedule, pulls anything that's changed in MySQL since the last run, and upserts it into MongoDB so the document store stays close to real-time.
Each run is incremental and idempotent. The MySQL updated_at column drives the sync window; the MySQL primary key becomes the MongoDB _id. Re-running a window is safe - documents get overwritten rather than duplicated.
Prerequisites
- A MySQL connection with read access to the source tables.
- An
updated_at(or equivalent) column on every table you intend to sync, with an index. Without it, every run becomes a full table scan. - A MongoDB connection with write access to the target database.
- A target database (e.g.
mysql_mirror) and a collection per source table.
Step 1: Schedule Trigger
Drop a Trigger node and set its type to Schedule. Five minutes is a good starting cadence; tune it based on how fast downstream consumers need fresh data and how much load MySQL can absorb.
Step 2: Variable - Track the Sync Window
Add a Variable assignment:
syncStart- the previous run's timestamp, or1970-01-01 00:00:00on the first run for a full backfill.syncEnd- the current timestamp, computed once so all queries in this run use the same upper bound.
Step 3: Query MySQL for Changed Rows
Add a Connector node pointing at the mysql connector and pick the execute-query tool. Pull only rows that fall in the window:
SELECT *
FROM customers
WHERE updated_at >= '{{ syncStart }}'
AND updated_at < '{{ syncEnd }}'
ORDER BY updated_at
LIMIT 5000
If the result set hits 5,000, page on updated_at by advancing syncStart to the last row's updated_at and repeating until you get fewer rows back.
Step 4: Transform - Shape Rows into Documents
Add a Transform node to convert each MySQL row into a MongoDB document:
- Set
_idto the MySQL primary key. This is what makes upserts idempotent. - Convert MySQL
DATETIMEvalues to ISO 8601 so MongoDB indexes them as Dates, not strings. - Stamp
syncedAtwith the current timestamp. - If the source table has joined data (e.g. an
addresstable), nest it into a sub-document - that's the whole point of moving to MongoDB.
Step 5: Loop and Upsert
Wrap the next step in a Loop over the transformed rows. Inside, add a Connector node for the mongodb connector with the update-documents tool:
- Database:
mysql_mirror - Collection:
customers(or whichever table you're mirroring) - Filter:
{ "_id": "{{ row._id }}" } - Update:
{ "$set": {{ row }} } - Upsert:
true
For high-throughput tables, prefer the insert-documents tool in batch mode with ordered: false when you know the run is for fresh inserts only - it's significantly faster than per-document upserts.
Step 6: Mirror Deletions
The query above never sees deleted MySQL rows. Two options. If your MySQL uses soft deletes, include deleted_at in the SELECT and translate it into a Mongo archived: true field. If MySQL hard-deletes, add a separate pass that lists all _ids in MongoDB, queries MySQL for the same IDs in chunks, and removes anything that no longer exists with the mongodb delete-documents tool.
Tips
- Index
updated_at- on the MySQL side, this is what makes incremental sync fast. Without the index, each run scans the full table. - Batch the inserts - MongoDB's bulk operations are much faster than per-document writes. Group transformed rows into batches of 500 before calling the connector.
- Multiple tables - run them in a Parallel node, one branch per table, so a slow table doesn't block the others.
Common Pitfalls
- Clock skew - if the MySQL server and Spojit run with different times, your window misses or double-counts rows. Always use MySQL's
NOW()(or pull it viaexecute-query) forsyncEndrather than a Spojit-side timestamp. - Schema changes - MongoDB is schemaless, but downstream consumers usually aren't. When MySQL adds a column, run a one-shot full backfill so older Mongo docs get the new field too.
- Big rows - MongoDB documents are capped at 16MB. If a MySQL row has huge JSON columns or text blobs, chunk them into a separate collection or compress them.
Testing
Set syncStart to one hour ago, run the workflow once, then compare a handful of records in both databases. Spot-check IDs, timestamps, and nested fields. Then turn the schedule on and watch the first few automated runs for row-count anomalies before walking away.