How to Build an ETL Pipeline with CSV, Transform, and MySQL

Extract data from CSV files, transform it, and load it into MySQL.

What This Integration Does

Every business has a partner, supplier, or legacy system that ships data in CSV. Doing that by hand once a week stops being charming after the second time someone fat-fingers a header. This workflow automates the full ETL loop: it picks the CSV up wherever it arrives, parses and validates it, shapes the rows to match your warehouse schema, and bulk-loads them into MySQL.

The pipeline is built to be re-runnable. Each file gets a deterministic batch ID, rows are inserted with a unique key derived from the source, and a failed run can be re-pointed at the same file without producing duplicates.

Prerequisites

A source for the CSV: an FTP or SFTP connection, an inbound email mailbox, or a webhook URL your partner can POST to.
A MySQL connection with write access to the destination tables.
The target MySQL table created up-front with the columns you expect plus a unique index on the natural key (e.g. source_id).
A sample CSV with realistic data so you can map columns confidently before going live.

Step 1: Trigger - Pick Up the File

Open the Trigger node and choose the source. For scheduled FTP drops, use the Schedule trigger type (a 5-field cron expression plus a timezone) and then a Connector node calling the ftp list-directory tool to find new files, followed by download-file. For email-attached CSVs sent to an address you control, use the Mailhook trigger and an Attachment node to fetch the file bytes. For partner uploads, use a Webhook trigger and have your partner POST the file.

Step 2: Parse the CSV

Add a Connector node for the csv connector in Direct mode and call the parse tool to read the raw file into rows, then to-json to turn it into an array of objects keyed by column name. When mapping the tool inputs, tell it whether the file has a header row, set the delimiter to match your source (some partners ship semicolon-delimited files), and trim whitespace from values, which is a free fix for the most common dirty-data problem.

Step 3: Validate the Shape

Before transforming, sanity-check the file. Add a Condition node that asserts the expected columns exist and the row count is non-zero. If the check fails, route to a slack send-message step so someone investigates before bad data lands in MySQL. For row-level validation (emails, dates, phone numbers), call the validation connector inside the per-row loop later.

Step 4: Transform - Reshape Rows for MySQL

Add a Transform node that maps source columns to target columns:

Rename headers ("Customer Email" becomes email).
Convert types - dates from MM/DD/YYYY to YYYY-MM-DD, currency from "$1,234.50" to 1234.50.
Compute new fields: source_id = batchId + ":" + rowIndex so each row has a deterministic key.
Stamp imported_at with the current timestamp.

For complex per-row logic, use the code connector with execute-javascript instead of the visual Transform - it scales better when there are 20+ field mappings.

Step 5: Load into MySQL

Add a Connector node for the mysql connector in Direct mode and pick the insert-rows tool. Map the destination table and pass the array of rows from Step 4. Because the table has a unique index on source_id, re-running the workflow on the same batch stays idempotent: duplicate keys are caught rather than inserted twice. If you need existing records refreshed instead of skipped, follow up with the update-rows tool keyed on source_id.

Insert in batches of 500-1000 rows for performance. If a file has hundreds of thousands of rows, chunk the transformed array with the array connector's chunk tool and loop over the chunks with a Loop node.

Step 6: Archive the File and Notify

After a successful load, move the source file out of the inbox folder so it isn't reprocessed: ftp rename into an archive/ folder works. Finish with a slack send-message step including the row count, batch ID, and any validation warnings. If anything failed earlier, the catch branch should write the file to a quarantine/ folder and alert instead.

Tips

Dedupe before loading - call the csv dedupe tool with the natural key as the dedupe column. Saves a downstream "why are there duplicates" investigation.
Sample big files - call info from the csv connector first to peek at row count and column types before loading the whole thing into memory.
Keep the raw file - even after loading, hold the original CSV in the archive folder for 30+ days. When data looks wrong later, the raw file is your ground truth.

Common Pitfalls

Encoding - Excel exports are often UTF-16 or Windows-1252, not UTF-8. If rows come through garbled, normalise the file to UTF-8 before the parse step rather than guessing at the source encoding.
BOM characters - a leading byte order mark prepends an invisible character to the first column name, so a header that should match id silently fails to map. Trim it with the text connector's trim tool on the header.
Header changes - partners renaming columns silently is the #1 cause of broken ETL. The Step 3 validation should fail loud when an expected column is missing.
Time zones in dates - the CSV date is in the sender's timezone; your DB might be UTC. Be explicit when converting.

Testing

Drop a 10-row sample file in the source folder and run the workflow manually. Inspect the MySQL table: do the columns line up, do the dates and numbers parse correctly, did the unique index reject any duplicates as expected? Once happy, drop a full-size file and watch the run time and row count, then turn the trigger on.