How to Handle Errors and Build Fallback Workflows

Build robust workflows that handle failures gracefully with retry logic, fallback paths, and error notifications.

What This Integration Does

Real-world workflows fail for boring reasons - the API was rate-limited for thirty seconds, a row had a NULL where you didn't expect one, an OAuth token expired between steps. A workflow that crashes on the first failure leaves data half-processed and your team in the dark. A well-designed Spojit workflow distinguishes between transient errors (retry and move on) and meaningful failures (alert someone and try a fallback), so the noisy stuff self-heals and the real problems surface fast.

Error handling in Spojit comes in three layers: per-node retries handle transient blips automatically, Condition nodes inspect step results and route to alternative paths, and notification steps loop a human in when nothing else works. This guide walks through wiring all three into a typical workflow.

Prerequisites

An existing workflow you want to harden - ideally one with at least one Connector node that calls an external API.
A notification channel - slack send-message, resend send-email, or the built-in Send Email node.
A place to park failed records for later review - a mongodb collection or a monday board both work well.

Step 1: Configure Per-Node Retries

Click the Connector node you want to harden and open its properties. Spojit can retry a node automatically with backoff on transient errors like timeouts, 5xx responses, and rate limits, so a brief blip self-heals instead of failing the run. A small number of attempts is a sensible default; piling on retries rarely helps and can multiply duplicate side effects. Reserve retries for steps where the same call repeated is safe, and read the retry behavior in the Spojit docs before relying on it for non-idempotent writes.

Step 2: Add a Condition After Risky Steps

After any Connector call that might fail in a non-retryable way (bad input, missing record, validation error), add a Condition node that inspects the step result. Branch on the result shape - for example {{ step2.success }} === true or {{ step2.status }} >= 200 && {{ step2.status }} < 300. The success path continues as normal; the failure path runs your recovery logic.

Step 3: Notify the Team on Failure

On the failure branch, add a slack send-message call to your ops channel. Include the workflow name, the failing step, the input data, and the error message so on-call can triage without digging through logs. Example payload:

{
  "channel": "#ops-alerts",
  "text": "Workflow {{ workflow.name }} failed at step {{ step2.id }}.\nInput: {{ step1 }}\nError: {{ step2.error }}"
}

For severity-tiered routing, use a Condition to pick the channel - critical failures go to #ops-pagerduty, soft failures go to #ops-monitoring.

Step 4: Park Failed Records in a Dead Letter Queue

If your workflow processes records one at a time (orders, payments, inventory updates) and one of them fails, you don't want to lose it. Add a Connector node on the failure path calling mongodb insert-documents into a workflow_failures collection (or monday create-item on a triage board). Include the input data, the error, and a timestamp. A separate scheduled workflow can replay these once the underlying issue is fixed.

Step 5: Build a Fallback Path for Critical Steps

For business-critical steps with an alternative, route the failure branch to a fallback service. Examples: if shippit create-order fails, try shipstation create-order; if the primary email provider resend send-email fails, fall back to smtp send-email. Keep the fallback shape compatible so downstream steps don't need to know which path was taken - a Transform node right after each branch can normalise the result.

Step 6: Wrap Loops in Per-Iteration Try-Catch

When using a Loop node over a list (orders, customers, files), put the Condition + failure-path pattern inside the loop body. A single bad record then logs itself to the dead-letter queue but the loop keeps going. Without this, one rotten record kills the whole batch. After the loop, summarise total successes and failures in a final notification.

Tips

Silent failures are worse than loud ones - always notify on the failure branch for production workflows.
For idempotent operations (read, upsert by key) retries are safe. For non-idempotent operations (create without an idempotency key), add a pre-check or use the API's idempotency-key header to avoid duplicates on retry.
The Send Email node has an If sending fails setting - choose Continue anyway when the email is informational and shouldn't block the rest of the workflow, or Fail the workflow when delivery is mandatory.
Use the execution log's input and output panes when debugging - they show exactly what each step saw and returned.

Common Pitfalls

Retrying non-retryable errors - 4xx auth and validation errors will keep failing on retry. Add a Condition to inspect the error and skip retries for permanent failures.
Double-actioning on retry - if a create call times out after the server processed it, the retry creates a duplicate. Use idempotency keys or post-create lookups to detect this.
Forgetting timezones in alerts - format timestamps with the date connector's format tool so on-call sees them in their local time.
Alert fatigue - if every transient blip pages someone, they stop reading the channel. Reserve high-severity channels for errors that the workflow couldn't recover from itself.

Testing

Deliberately break the happy path to test each branch. Point the Connector node at a sandbox endpoint that returns 500, then 401, then succeeds - each should hit a different recovery path. Drop a malformed record into the input data and watch it land in the dead-letter queue. Once each failure mode behaves the way you want, point the workflow back at production with retries and fallbacks already proven.