How to Handle Errors and Build Fallback Workflows
Build robust workflows that handle failures gracefully with retry logic, fallback paths, and error notifications.
What This Integration Does
Real-world workflows fail for boring reasons - the API was rate-limited for thirty seconds, a row had a NULL where you didn't expect one, an OAuth token expired between steps. A workflow that crashes on the first failure leaves data half-processed and your team in the dark. A well-designed Spojit workflow distinguishes between transient errors (retry and move on) and meaningful failures (alert someone and try a fallback), so the noisy stuff self-heals and the real problems surface fast.
Error handling in Spojit comes in three layers: per-node retries handle transient blips automatically, Condition nodes inspect step results and route to alternative paths, and notification steps loop a human in when nothing else works. This guide walks through wiring all three into a typical workflow.
Prerequisites
- An existing workflow you want to harden - ideally one with at least one Connector node that calls an external API.
- A notification channel - slack
send-message, resendsend-email, or the built-in Send Email node. - A place to park failed records for later review - a mongodb collection or a monday board both work well.
Step 1: Configure Per-Node Retries
Click the Connector node you want to harden and open its properties. Set Max Attempts (1 to 5) - Spojit retries with exponential backoff on transient errors like timeouts, 5xx responses, and rate limits. Enable Retry on Tool Errors if you also want to retry errors the tool itself reports (useful for connectors that surface upstream 429s as tool errors rather than HTTP errors). Three attempts is a sensible default; more than five rarely helps.
Step 2: Add a Condition After Risky Steps
After any Connector call that might fail in a non-retryable way (bad input, missing record, validation error), add a Condition node that inspects the step result. Branch on the result shape - for example {{ step2.success }} === true or {{ step2.status }} >= 200 && {{ step2.status }} < 300. The success path continues as normal; the failure path runs your recovery logic.
Step 3: Notify the Team on Failure
On the failure branch, add a slack send-message call to your ops channel. Include the workflow name, the failing step, the input data, and the error message so on-call can triage without digging through logs. Example payload:
{
"channel": "#ops-alerts",
"text": "Workflow {{ workflow.name }} failed at step {{ step2.id }}.\nInput: {{ step1 }}\nError: {{ step2.error }}"
}
For severity-tiered routing, use a Condition to pick the channel - critical failures go to #ops-pagerduty, soft failures go to #ops-monitoring.
Step 4: Park Failed Records in a Dead Letter Queue
If your workflow processes records one at a time (orders, payments, inventory updates) and one of them fails, you don't want to lose it. Add a Connector node on the failure path calling mongodb insert-documents into a workflow_failures collection (or monday create-item on a triage board). Include the input data, the error, and a timestamp. A separate scheduled workflow can replay these once the underlying issue is fixed.
Step 5: Build a Fallback Path for Critical Steps
For business-critical steps with an alternative, route the failure branch to a fallback service. Examples: if shippit create-order fails, try shipstation create-order; if the primary email provider resend send-email fails, fall back to smtp send-email. Keep the fallback shape compatible so downstream steps don't need to know which path was taken - a Transform node right after each branch can normalise the result.
Step 6: Wrap Loops in Per-Iteration Try-Catch
When using a Loop node over a list (orders, customers, files), put the Condition + failure-path pattern inside the loop body. A single bad record then logs itself to the dead-letter queue but the loop keeps going. Without this, one rotten record kills the whole batch. After the loop, summarise total successes and failures in a final notification.
Tips
- Silent failures are worse than loud ones - always notify on the failure branch for production workflows.
- For idempotent operations (read, upsert by key) retries are safe. For non-idempotent operations (create without an idempotency key), add a pre-check or use the API's idempotency-key header to avoid duplicates on retry.
- The Send Email node has a built-in Continue on error setting - flip it on when the email is informational and shouldn't block the rest of the workflow.
- Use the execution log's input and output panes when debugging - they show exactly what each step saw and returned.
Common Pitfalls
- Retrying non-retryable errors - 4xx auth and validation errors will keep failing on retry. Add a Condition to inspect the error and skip retries for permanent failures.
- Double-actioning on retry - if a create call times out after the server processed it, the retry creates a duplicate. Use idempotency keys or post-create lookups to detect this.
- Forgetting timezones in alerts - format timestamps with the date connector's
formattool so on-call sees them in their local time. - Alert fatigue - if every transient blip pages someone, they stop reading the channel. Reserve high-severity channels for errors that the workflow couldn't recover from itself.
Testing
Deliberately break the happy path to test each branch. Point the Connector node at a sandbox endpoint that returns 500, then 401, then succeeds - each should hit a different recovery path. Drop a malformed record into the input data and watch it land in the dead-letter queue. Once each failure mode behaves the way you want, point the workflow back at production with retries and fallbacks already proven.