How to Parse and Transform XML Data from Legacy Systems

Use the XML Tools connector to convert XML from legacy APIs and systems into JSON for modern workflows.

What This Integration Does

Lots of enterprise systems - SOAP APIs, freight carriers, ERPs, EDI pipelines, certain banking and government feeds - still speak XML. Spojit workflows work natively with JSON, so the xml connector is the bridge: it parses inbound XML into JSON your nodes can manipulate, and converts JSON back to XML when you need to send data the other way. With this in place, an XML-only upstream system is no different from any other integration once you've parsed the payload.

A typical flow starts with whatever brings XML in (an HTTP call to a SOAP endpoint, an FTP file drop, a webhook from a legacy system), passes the raw body to xml to-json, then operates on the resulting JSON object exactly like any other Spojit data. For outbound XML, the workflow builds a JSON object in a Transform node and converts it back with from-json before posting it via http.

Prerequisites

  • A source of XML data - a SOAP endpoint reachable from Spojit, an FTP server with XML files, or a webhook that posts XML bodies.
  • An understanding of the document structure - even a sample payload is enough to design the Transform.
  • An http, ftp, or webhook connection depending on how the XML arrives.

Step 1: Trigger and Acquire the XML

Start with a Trigger node sized to your source - a Schedule trigger for polling a SOAP API, a Webhook trigger for systems that push XML to you, or a Manual trigger for ad-hoc runs. If the trigger doesn't already carry the XML in its payload, add a Connector node that fetches it: http http-get for REST-flavoured XML, http http-post with a SOAP envelope body for SOAP APIs, or ftp download-file for batch file drops.

Step 2: Parse XML to JSON

Add a Connector node pointing at the xml connector and pick the to-json tool. Feed the raw XML body in as input. The output is a JSON object that preserves the XML's nesting - element names become keys, attributes typically become a sibling object on the same key, and repeated elements become arrays. Inspect the parsed shape in the execution log before you build downstream transforms - it's the easiest way to confirm namespaces and array handling look right.

Step 3: Validate or Extract Selectively

For large documents you don't want to fully parse, use xml extract with an XPath-style expression to pull just the elements you care about. For inbound documents from a critical upstream, run xml validate against your expected schema first and fail loudly on schema drift rather than silently producing nonsense downstream. Pair the validation step with a Condition node that routes invalid documents to an error path.

Step 4: Transform the Parsed JSON

Add a Transform node to reshape the parsed JSON into the structure your downstream systems expect. Flatten deep nesting, rename verbose element names (xs:CustomerRecord becomes customer), drop XML noise (namespace prefixes, schema attributes), and coerce types - XML carries everything as strings, so you'll typically want to parse numbers and dates explicitly. The json connector's pick, omit, and flatten tools are useful for the heavier work.

Step 5: Build the Outbound XML if Needed

For systems that also need XML back (SOAP responses, EDI acknowledgements, carrier booking calls), construct the response as a JSON object in another Transform node, then convert it with xml from-json. For SOAP, you'll wrap the result in the SOAP envelope yourself - keep a template envelope as a workspace variable and interpolate the body. The xml prettify tool is handy during development; minify is what you want for the actual request body.

Step 6: Send and Handle the Response

Post the outbound XML via http http-post with Content-Type: application/xml (or text/xml; charset=utf-8 for older SOAP services) and the appropriate SOAPAction header if required. The response will be XML too, so chain another xml to-json step to parse it and a Condition to check for SOAP faults before continuing. Wrap critical calls in retries (the standard Connector retry settings handle transient 5xx and timeouts automatically).

Tips

  • Run prettify on a sample payload during development - readable XML in the execution log makes debugging much faster.
  • Some XML libraries return single-element arrays as objects and multi-element arrays as arrays - normalise to always-array in your Transform so downstream Loops never break on a "one-element" edge case.
  • For huge files, use extract with a targeted path instead of to-json on the whole document.
  • SOAP services often return HTTP 200 even on faults. Always inspect the parsed body for a Fault element rather than trusting status codes alone.

Common Pitfalls

  • Namespaces - prefixed element names like ns:Customer survive into the JSON. Either strip them in the Transform or query with the prefix included; mixing the two breaks intermittently.
  • CDATA sections - free-form text fields wrapped in CDATA may include characters that re-break when you serialise back to XML. Test round-tripping with a representative payload.
  • Encoding mismatches - declared UTF-8 with actual Windows-1252 content is common in legacy feeds and will corrupt characters. If you see mojibake, check the declared encoding against the actual bytes.
  • Schema drift - upstream adds a new element silently. validate catches this early; without it your Transform produces missing fields and the failure shows up much further downstream.

Testing

Run the workflow once with a captured sample payload (Manual trigger with the XML pasted in). Inspect each step's input and output in the execution log - confirm the parsed shape matches what your Transform expects, the round-trip back to XML matches the original well enough for the upstream system, and the SOAP fault path triggers when you feed it a deliberately bad request. Only then point the trigger at the live source.

Learn More

Did this answer your question? Thanks for the feedback There was a problem submitting your feedback. Please try again later.