How to Build an ETL Pipeline with CSV, Transform, and MySQL

Extract data from CSV files, transform it, and load it into MySQL.

What This Integration Does

Every business has a partner, supplier, or legacy system that ships data in CSV. Doing that by hand once a week stops being charming after the second time someone fat-fingers a header. This workflow automates the full ETL loop: it picks the CSV up wherever it arrives, parses and validates it, shapes the rows to match your warehouse schema, and bulk-loads them into MySQL.

The pipeline is built to be re-runnable. Each file gets a deterministic batch ID, rows are inserted with a unique key derived from the source, and a failed run can be re-pointed at the same file without producing duplicates.

Prerequisites

  • A source for the CSV: an FTP or SFTP connection, an inbound email mailbox, or a webhook URL your partner can POST to.
  • A MySQL connection with write access to the destination tables.
  • The target MySQL table created up-front with the columns you expect plus a unique index on the natural key (e.g. source_id).
  • A sample CSV with realistic data so you can map columns confidently before going live.

Step 1: Trigger - Pick Up the File

Drop a Trigger node and choose the source. For scheduled FTP drops, use Schedule and then a Connector node calling the ftp list-directory tool to find new files, followed by download-file. For email-attached CSVs, use the Email trigger and read the attachment. For partner uploads, use a Webhook trigger.

Step 2: Parse the CSV

Add a Connector node for the csv connector and call the parse tool to get a row count and column header check, then to-json to turn the file into an array of objects keyed by column name. Configure:

  • Has header: true (assuming the file has one)
  • Delimiter: comma, or whatever your source uses (some partners ship semicolon-delimited files)
  • Trim values: true - free fix for the most common dirty-data problem

Step 3: Validate the Shape

Before transforming, sanity-check the file. Add a Condition node that asserts the expected columns exist and the row count is non-zero. If the check fails, route to a slack send-message step so someone investigates before bad data lands in MySQL. For row-level validation (emails, dates, phone numbers), call the validation connector inside the per-row loop later.

Step 4: Transform - Reshape Rows for MySQL

Add a Transform node that maps source columns to target columns:

  • Rename headers ("Customer Email" becomes email).
  • Convert types - dates from MM/DD/YYYY to YYYY-MM-DD, currency from "$1,234.50" to 1234.50.
  • Compute new fields: source_id = batchId + ":" + rowIndex so each row has a deterministic key.
  • Stamp imported_at with the current timestamp.

For complex per-row logic, use the code connector with execute-javascript instead of the visual Transform - it scales better when there are 20+ field mappings.

Step 5: Load into MySQL

Add a Connector node for the mysql connector and pick the insert-rows tool. Configure:

  • Table: the destination table
  • Rows: the array from Step 4
  • On duplicate key: update - so re-running the workflow on the same batch is idempotent

Insert in batches of 500-1000 rows for performance. If a file has hundreds of thousands of rows, chunk the transformed array with the array connector's chunk tool and loop over the chunks.

Step 6: Archive the File and Notify

After a successful load, move the source file out of the inbox folder so it isn't reprocessed: ftp rename into an archive/ folder works. Finish with a slack send-message step including the row count, batch ID, and any validation warnings. If anything failed earlier, the catch branch should write the file to a quarantine/ folder and alert instead.

Tips

  • Dedupe before loading - call the csv dedupe tool with the natural key as the dedupe column. Saves a downstream "why are there duplicates" investigation.
  • Sample big files - call info from the csv connector first to peek at row count and column types before loading the whole thing into memory.
  • Keep the raw file - even after loading, hold the original CSV in the archive folder for 30+ days. When data looks wrong later, the raw file is your ground truth.

Common Pitfalls

  • Encoding - Excel exports are often UTF-16 or Windows-1252, not UTF-8. The CSV connector accepts an encoding param; set it explicitly rather than guessing.
  • BOM characters - a leading byte order mark turns the first column name into something like id. Trim it with text trim on the header.
  • Header changes - partners renaming columns silently is the #1 cause of broken ETL. The Step 3 validation should fail loud when an expected column is missing.
  • Time zones in dates - the CSV date is in the sender's timezone; your DB might be UTC. Be explicit when converting.

Testing

Drop a 10-row sample file in the source folder and run the workflow manually. Inspect the MySQL table: do the columns line up, do the dates and numbers parse correctly, did the unique index reject any duplicates as expected? Once happy, drop a full-size file and watch the run time and row count, then turn the trigger on.

Learn More

Did this answer your question? Thanks for the feedback There was a problem submitting your feedback. Please try again later.