How to Use Different AI Models for Different Tasks in One Workflow

Optimize cost and performance by choosing the right AI model for each step.

What This Integration Does

Most workflows that "use AI" actually run several AI steps - classify, summarize, analyze, generate. Using the same flagship model for all of them is the easiest way to write a workflow, and the easiest way to overpay for it by 5-10x. The fix is to pick the cheapest model that can do each step well: a fast small model for classification, a long-context model for summarization, a strong reasoning model for the final decision.

The workflow runs each AI step with an independently chosen model and routes data between them via structured output. You can A/B different model assignments by swapping the model selector on one step without touching anything else.

Prerequisites

  • Multiple AI providers or models enabled in your workspace (Claude family, Gemini family, etc).
  • A workflow that already has more than one AI Agent step - this pattern is about optimizing what you have, not adding new steps.

Step 1: Map Each Step to a Difficulty Tier

Walk through each AI step in the workflow and label it. A useful mental model:

  • Cheap and fast (Claude Haiku, Gemini Flash) - classification into a small enum, simple extraction, yes/no judgments.
  • Balanced (Claude Sonnet) - free-form summarization, copywriting, light reasoning, structured extraction over a complex schema.
  • Heavy reasoning (Claude Opus, Gemini Pro) - multi-step analysis, planning, code generation, anything where wrong answers are expensive.
  • Long context (Gemini Pro 1M, Claude Sonnet 1M) - documents over 50k tokens, conversation histories, large knowledge dumps.

Step 2: Triage Step - Use the Cheapest Model

Add a Connector node in Agent Mode with a fast model selected and Structured Output enforcing a small enum:

{
  "type":     { "type": "string", "enum": ["billing", "technical", "feature-request"] },
  "urgency":  { "type": "string", "enum": ["low", "normal", "high"] }
}

This step decides which branch the workflow takes downstream. Spending 10x more on this classification doesn't change the branch the workflow ends up in - so don't.

Step 3: Summarize / Compress Step - Long Context Model

If the input is a long thread, a large document, or several days of history, add a Connector node in Agent Mode with a long-context model. Have it produce a 200-300 word summary plus a list of key entities. The downstream reasoning step now sees a compact, high-signal input instead of 50k tokens of raw history.

Step 4: Branch with a Condition Node

Add a Condition node that routes on the triage label. Each branch can use a different model for the next step. Simple branches stay on cheap models; complex ones graduate to a stronger one. This is the single biggest cost lever in the workflow.

Step 5: Heavy Reasoning Step - Strong Model

On the complex branch (e.g. technical escalations, refund decisions, contract analysis), add a Connector node in Agent Mode with your strongest model. Feed it the triage label, the summary, plus any enrichment data from a Knowledge node or storefront connector. Use structured output and ask for a recommendation plus reasoning:

{
  "recommendation": { "type": "string" },
  "reasoning":      { "type": "string" },
  "confidence":     { "type": "number" }
}

Step 6: Act and Log

End the workflow with deterministic action steps (a Connector calling shopify, netsuite, monday, etc.) and a mongodb insert-documents log that captures which model was used at each step, total token count, and cost estimate. This log is what lets you tune model selection over time based on real data.

Tips

  • Default to the cheapest viable model and only escalate when you have data showing the cheaper one fails. Don't preemptively over-spec.
  • Keep prompts tight - smaller prompts let smaller models hit acceptable quality, which compounds your savings.
  • When you swap models on a step, re-run the test suite for that step - same prompt, different model, very different output sometimes.

Common Pitfalls

  • Structured output drift - smaller models occasionally violate the schema. Validate after each step and retry on a stronger model rather than failing the workflow.
  • Provider rate limits - mixing providers can quietly hit different limits. Watch for 429s in the workflow execution log and add per-provider Loop concurrency caps.
  • Inconsistent system prompts - the same brand voice across multiple models means you write the system prompt once and inject it into every step (a Subworkflow works well for this).

Testing

Pick 20 historical inputs and run them through the workflow twice - once with every step on your strongest model, once with the tiered assignment. Compare outputs and cost. If quality is identical or near-identical and cost is half, ship the tiered version. If quality dropped on a specific step, that's where to upgrade the model assignment.

Learn More

Did this answer your question? Thanks for the feedback There was a problem submitting your feedback. Please try again later.