Agent Workflow Designer¶

Domain: Engineering - POWERFUL | Skill: agent-workflow-designer | Source: engineering/agent-workflow-designer/SKILL.md

Agent Workflow Designer¶

Tier: POWERFUL
Category: Engineering
Domain: Multi-Agent Systems / AI Orchestration

Overview¶

Design production-grade multi-agent orchestration systems. Covers five core patterns (sequential pipeline, parallel fan-out/fan-in, hierarchical delegation, event-driven, consensus), platform-specific implementations, handoff protocols, state management, error recovery, context window budgeting, and cost optimization.

Core Capabilities¶

Pattern selection guide for any orchestration requirement
Handoff protocol templates (structured context passing)
State management patterns for multi-agent workflows
Error recovery and retry strategies
Context window budget management
Cost optimization strategies per platform
Platform-specific configs: Claude Code Agent Teams, OpenClaw, CrewAI, AutoGen

When to Use¶

Building a multi-step AI pipeline that exceeds one agent's context capacity
Parallelizing research, generation, or analysis tasks for speed
Creating specialist agents with defined roles and handoff contracts
Designing fault-tolerant AI workflows for production

Pattern Selection Guide¶

Is the task sequential (each step needs previous output)?
  YES → Sequential Pipeline
  NO  → Can tasks run in parallel?
          YES → Parallel Fan-out/Fan-in
          NO  → Is there a hierarchy of decisions?
                  YES → Hierarchical Delegation
                  NO  → Is it event-triggered?
                          YES → Event-Driven
                          NO  → Need consensus/validation?
                                  YES → Consensus Pattern

Pattern 1: Sequential Pipeline¶

Use when: Each step depends on the previous output. Research → Draft → Review → Polish.

# sequential_pipeline.py
from dataclasses import dataclass
from typing import Callable, Any
import anthropic

@dataclass
class PipelineStage:
    name: "str"
    system_prompt: str
    input_key: str      # what to take from state
    output_key: str     # what to write to state
    model: str = "claude-3-5-sonnet-20241022"
    max_tokens: int = 2048

class SequentialPipeline:
    def __init__(self, stages: list[PipelineStage]):
        self.stages = stages
        self.client = anthropic.Anthropic()

    def run(self, initial_input: str) -> dict:
        state = {"input": initial_input}

        for stage in self.stages:
            print(f"[{stage.name}] Processing...")

            stage_input = state.get(stage.input_key, "")

            response = self.client.messages.create(
                model=stage.model,
                max_tokens=stage.max_tokens,
                system=stage.system_prompt,
                messages=[{"role": "user", "content": stage_input}],
            )

            state[stage.output_key] = response.content[0].text
            state[f"{stage.name}_tokens"] = response.usage.input_tokens + response.usage.output_tokens

            print(f"[{stage.name}] Done. Tokens: {state[f'{stage.name}_tokens']}")

        return state

# Example: Blog post pipeline
pipeline = SequentialPipeline([
    PipelineStage(
        name="researcher",
        system_prompt="You are a research specialist. Given a topic, produce a structured research brief with: key facts, statistics, expert perspectives, and controversy points.",
        input_key="input",
        output_key="research",
    ),
    PipelineStage(
        name="writer",
        system_prompt="You are a senior content writer. Using the research provided, write a compelling 800-word blog post with a clear hook, 3 main sections, and a strong CTA.",
        input_key="research",
        output_key="draft",
    ),
    PipelineStage(
        name="editor",
        system_prompt="You are a copy editor. Review the draft for: clarity, flow, grammar, and SEO. Return the improved version only, no commentary.",
        input_key="draft",
        output_key="final",
    ),
])

Pattern 2: Parallel Fan-out / Fan-in¶

Use when: Independent tasks that can run concurrently. Research 5 competitors simultaneously.

# parallel_fanout.py
import asyncio
import anthropic
from typing import Any

async def run_agent(client, task_name: "str-system-str-user-str-model-str"claude-3-5-sonnet-20241022") -> dict:
    """Single async agent call"""
    loop = asyncio.get_event_loop()

    def _call():
        return client.messages.create(
            model=model,
            max_tokens=2048,
            system=system,
            messages=[{"role": "user", "content": user}],
        )

    response = await loop.run_in_executor(None, _call)
    return {
        "task": task_name,
        "output": response.content[0].text,
        "tokens": response.usage.input_tokens + response.usage.output_tokens,
    }

async def parallel_research(competitors: list[str], research_type: str) -> dict:
    """Fan-out: research all competitors in parallel. Fan-in: synthesize results."""
    client = anthropic.Anthropic()

    # FAN-OUT: spawn parallel agent calls
    tasks = [
        run_agent(
            client,
            task_name=competitor,
            system=f"You are a competitive intelligence analyst. Research {competitor} and provide: pricing, key features, target market, and known weaknesses.",
            user=f"Analyze {competitor} for comparison with our product in the {research_type} market.",
        )
        for competitor in competitors
    ]

    results = await asyncio.gather(*tasks, return_exceptions=True)

    # Handle failures gracefully
    successful = [r for r in results if not isinstance(r, Exception)]
    failed = [r for r in results if isinstance(r, Exception)]

    if failed:
        print(f"Warning: {len(failed)} research tasks failed: {failed}")

    # FAN-IN: synthesize
    combined_research = "\n\n".join([
        f"## {r['task']}\n{r['output']}" for r in successful
    ])

    synthesis = await run_agent(
        client,
        task_name="synthesizer",
        system="You are a strategic analyst. Synthesize competitor research into a concise comparison matrix and strategic recommendations.",
        user=f"Synthesize these competitor analyses:\n\n{combined_research}",
        model="claude-3-5-sonnet-20241022",
    )

    return {
        "individual_analyses": successful,
        "synthesis": synthesis["output"],
        "total_tokens": sum(r["tokens"] for r in successful) + synthesis["tokens"],
    }

Pattern 3: Hierarchical Delegation¶

Use when: Complex tasks with subtask discovery. Orchestrator breaks down work, delegates to specialists.

# hierarchical_delegation.py
import json
import anthropic

ORCHESTRATOR_SYSTEM = """You are an orchestration agent. Your job is to:
1. Analyze the user's request
2. Break it into subtasks
3. Assign each to the appropriate specialist agent
4. Collect results and synthesize

Available specialists:
- researcher: finds facts, data, and information
- writer: creates content and documents  
- coder: writes and reviews code
- analyst: analyzes data and produces insights

Respond with a JSON plan:
{
  "subtasks": [
    {"id": "1", "agent": "researcher", "task": "...", "depends_on": []},
    {"id": "2", "agent": "writer", "task": "...", "depends_on": ["1"]}
  ]
}"""

SPECIALIST_SYSTEMS = {
    "researcher": "You are a research specialist. Find accurate, relevant information and cite sources when possible.",
    "writer": "You are a professional writer. Create clear, engaging content in the requested format.",
    "coder": "You are a senior software engineer. Write clean, well-commented code with error handling.",
    "analyst": "You are a data analyst. Provide structured analysis with evidence-backed conclusions.",
}

class HierarchicalOrchestrator:
    def __init__(self):
        self.client = anthropic.Anthropic()

    def run(self, user_request: str) -> str:
        # 1. Orchestrator creates plan
        plan_response = self.client.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=1024,
            system=ORCHESTRATOR_SYSTEM,
            messages=[{"role": "user", "content": user_request}],
        )

        plan = json.loads(plan_response.content[0].text)
        results = {}

        # 2. Execute subtasks respecting dependencies
        for subtask in self._topological_sort(plan["subtasks"]):
            context = self._build_context(subtask, results)
            specialist = SPECIALIST_SYSTEMS[subtask["agent"]]

            result = self.client.messages.create(
                model="claude-3-5-sonnet-20241022",
                max_tokens=2048,
                system=specialist,
                messages=[{"role": "user", "content": f"{context}\n\nTask: {subtask['task']}"}],
            )
            results[subtask["id"]] = result.content[0].text

        # 3. Final synthesis
        all_results = "\n\n".join([f"### {k}\n{v}" for k, v in results.items()])
        synthesis = self.client.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=2048,
            system="Synthesize the specialist outputs into a coherent final response.",
            messages=[{"role": "user", "content": f"Original request: {user_request}\n\nSpecialist outputs:\n{all_results}"}],
        )
        return synthesis.content[0].text

    def _build_context(self, subtask: dict, results: dict) -> str:
        if not subtask.get("depends_on"):
            return ""
        deps = [f"Output from task {dep}:\n{results[dep]}" for dep in subtask["depends_on"] if dep in results]
        return "Previous results:\n" + "\n\n".join(deps) if deps else ""

    def _topological_sort(self, subtasks: list) -> list:
        # Simple ordered execution respecting depends_on
        ordered, remaining = [], list(subtasks)
        completed = set()
        while remaining:
            for task in remaining:
                if all(dep in completed for dep in task.get("depends_on", [])):
                    ordered.append(task)
                    completed.add(task["id"])
                    remaining.remove(task)
                    break
        return ordered

Handoff Protocol Template¶

# Standard handoff context format — use between all agents
@dataclass
class AgentHandoff:
    """Structured context passed between agents in a workflow."""
    task_id: str
    workflow_id: str
    step_number: int
    total_steps: int

    # What was done
    previous_agent: str
    previous_output: str
    artifacts: dict  # {"filename": "content"} for any files produced

    # What to do next
    current_agent: str
    current_task: str
    constraints: list[str]  # hard rules for this step

    # Metadata
    context_budget_remaining: int  # tokens left for this agent
    cost_so_far_usd: float

    def to_prompt(self) -> str:
        return f"""
# Agent Handoff — Step {self.step_number}/{self.total_steps}

## Your Task
{self.current_task}

## Constraints
{chr(10).join(f'- {c}' for c in self.constraints)}

## Context from Previous Step ({self.previous_agent})
{self.previous_output[:2000]}{"... [truncated]" if len(self.previous_output) > 2000 else ""}

## Context Budget
You have approximately {self.context_budget_remaining} tokens remaining. Be concise.
"""

Error Recovery Patterns¶

import time
from functools import wraps

def with_retry(max_attempts=3, backoff_seconds=2, fallback_model=None):
    """Decorator for agent calls with exponential backoff and model fallback."""
    def decorator(fn):
        @wraps(fn)
        def wrapper(*args, **kwargs):
            last_error = None
            for attempt in range(max_attempts):
                try:
                    return fn(*args, **kwargs)
                except Exception as e:
                    last_error = e
                    if attempt < max_attempts - 1:
                        wait = backoff_seconds * (2 ** attempt)
                        print(f"Attempt {attempt+1} failed: {e}. Retrying in {wait}s...")
                        time.sleep(wait)

                        # Fall back to cheaper/faster model on rate limit
                        if fallback_model and "rate_limit" in str(e).lower():
                            kwargs["model"] = fallback_model
            raise last_error
        return wrapper
    return decorator

@with_retry(max_attempts=3, fallback_model="claude-3-haiku-20240307")
def call_agent(model, system, user):
    ...

Context Window Budgeting¶

# Budget context across a multi-step pipeline
# Rule: never let any step consume more than 60% of remaining budget

CONTEXT_LIMITS = {
    "claude-3-5-sonnet-20241022": 200_000,
    "gpt-4o": 128_000,
}

class ContextBudget:
    def __init__(self, model: str, reserve_pct: float = 0.2):
        total = CONTEXT_LIMITS.get(model, 128_000)
        self.total = total
        self.reserve = int(total * reserve_pct)  # keep 20% as buffer
        self.used = 0

    @property
    def remaining(self):
        return self.total - self.reserve - self.used

    def allocate(self, step_name: "str-requested-int-int"
        allocated = min(requested, int(self.remaining * 0.6))  # max 60% of remaining
        print(f"[Budget] {step_name}: allocated {allocated:,} tokens (remaining: {self.remaining:,})")
        return allocated

    def consume(self, tokens_used: int):
        self.used += tokens_used

def truncate_to_budget(text: str, token_budget: int, chars_per_token: float = 4.0) -> str:
    """Rough truncation — use tiktoken for precision."""
    char_budget = int(token_budget * chars_per_token)
    if len(text) <= char_budget:
        return text
    return text[:char_budget] + "\n\n[... truncated to fit context budget ...]"

Cost Optimization Strategies¶

Strategy	Savings	Tradeoff
Use Haiku for routing/classification	85-90%	Slightly less nuanced judgment
Cache repeated system prompts	50-90%	Requires prompt caching setup
Truncate intermediate outputs	20-40%	May lose detail in handoffs
Batch similar tasks	50%	Latency increases
Use Sonnet for most, Opus for final step only	60-70%	Final quality may improve
Short-circuit on confidence threshold	30-50%	Need confidence scoring

Common Pitfalls¶

Circular dependencies — agents calling each other in loops; enforce DAG structure at design time
Context bleed — passing entire previous output to every step; summarize or extract only what's needed
No timeout — a stuck agent blocks the whole pipeline; always set max_tokens and wall-clock timeouts
Silent failures — agent returns plausible but wrong output; add validation steps for critical paths
Ignoring cost — 10 parallel Opus calls is $0.50 per workflow; model selection is a cost decision
Over-orchestration — if a single prompt can do it, it should; only add agents when genuinely needed