How to Build an ETL Pipeline with CSV, Transform, and MySQL
Extract data from CSV files, transform it, and load it into MySQL.
What This Integration Does
Every business has a partner, supplier, or legacy system that ships data in CSV. Doing that by hand once a week stops being charming after the second time someone fat-fingers a header. This workflow automates the full ETL loop: it picks the CSV up wherever it arrives, parses and validates it, shapes the rows to match your warehouse schema, and bulk-loads them into MySQL.
The pipeline is built to be re-runnable. Each file gets a deterministic batch ID, rows are inserted with a unique key derived from the source, and a failed run can be re-pointed at the same file without producing duplicates.
Prerequisites
- A source for the CSV: an FTP or SFTP connection, an inbound email mailbox, or a webhook URL your partner can POST to.
- A MySQL connection with write access to the destination tables.
- The target MySQL table created up-front with the columns you expect plus a unique index on the natural key (e.g.
source_id). - A sample CSV with realistic data so you can map columns confidently before going live.
Step 1: Trigger - Pick Up the File
Drop a Trigger node and choose the source. For scheduled FTP drops, use Schedule and then a Connector node calling the ftp list-directory tool to find new files, followed by download-file. For email-attached CSVs, use the Email trigger and read the attachment. For partner uploads, use a Webhook trigger.
Step 2: Parse the CSV
Add a Connector node for the csv connector and call the parse tool to get a row count and column header check, then to-json to turn the file into an array of objects keyed by column name. Configure:
- Has header:
true(assuming the file has one) - Delimiter: comma, or whatever your source uses (some partners ship semicolon-delimited files)
- Trim values:
true- free fix for the most common dirty-data problem
Step 3: Validate the Shape
Before transforming, sanity-check the file. Add a Condition node that asserts the expected columns exist and the row count is non-zero. If the check fails, route to a slack send-message step so someone investigates before bad data lands in MySQL. For row-level validation (emails, dates, phone numbers), call the validation connector inside the per-row loop later.
Step 4: Transform - Reshape Rows for MySQL
Add a Transform node that maps source columns to target columns:
- Rename headers (
"Customer Email"becomesemail). - Convert types - dates from
MM/DD/YYYYtoYYYY-MM-DD, currency from"$1,234.50"to1234.50. - Compute new fields:
source_id = batchId + ":" + rowIndexso each row has a deterministic key. - Stamp
imported_atwith the current timestamp.
For complex per-row logic, use the code connector with execute-javascript instead of the visual Transform - it scales better when there are 20+ field mappings.
Step 5: Load into MySQL
Add a Connector node for the mysql connector and pick the insert-rows tool. Configure:
- Table: the destination table
- Rows: the array from Step 4
- On duplicate key:
update- so re-running the workflow on the same batch is idempotent
Insert in batches of 500-1000 rows for performance. If a file has hundreds of thousands of rows, chunk the transformed array with the array connector's chunk tool and loop over the chunks.
Step 6: Archive the File and Notify
After a successful load, move the source file out of the inbox folder so it isn't reprocessed: ftp rename into an archive/ folder works. Finish with a slack send-message step including the row count, batch ID, and any validation warnings. If anything failed earlier, the catch branch should write the file to a quarantine/ folder and alert instead.
Tips
- Dedupe before loading - call the csv
dedupetool with the natural key as the dedupe column. Saves a downstream "why are there duplicates" investigation. - Sample big files - call
infofrom the csv connector first to peek at row count and column types before loading the whole thing into memory. - Keep the raw file - even after loading, hold the original CSV in the archive folder for 30+ days. When data looks wrong later, the raw file is your ground truth.
Common Pitfalls
- Encoding - Excel exports are often UTF-16 or Windows-1252, not UTF-8. The CSV connector accepts an encoding param; set it explicitly rather than guessing.
- BOM characters - a leading byte order mark turns the first column name into something like
id. Trim it with texttrimon the header. - Header changes - partners renaming columns silently is the #1 cause of broken ETL. The Step 3 validation should fail loud when an expected column is missing.
- Time zones in dates - the CSV date is in the sender's timezone; your DB might be UTC. Be explicit when converting.
Testing
Drop a 10-row sample file in the source folder and run the workflow manually. Inspect the MySQL table: do the columns line up, do the dates and numbers parse correctly, did the unique index reject any duplicates as expected? Once happy, drop a full-size file and watch the run time and row count, then turn the trigger on.