Building AI agents that don’t hallucinate, get stuck in loops, or drain your API budget requires more than plugging an LLM into a chat interface. You need architectural patterns — proven templates for how agents think, plan, use tools, and remember context.
This guide covers the five essential patterns every developer should know when building production-ready AI agents.
:::note[TL;DR]
- ReAct Pattern: Think → Act → Observe → Repeat. Best for single-step reasoning tasks.
- Plan-and-Execute: Plan all steps first, then execute. Better for complex multi-step workflows.
- Multi-Agent Systems: Divide work between specialized agents. Scales to enterprise complexity.
- Tool Use with Reflection: Let agents call APIs, then verify results before proceeding.
- Memory Systems: Short-term (context window) + Long-term (vector DB) + Entity (knowledge graph). :::
What Is an AI Agent?
An AI agent is an LLM-powered system that can:
- Reason through complex problems
- Plan sequences of actions
- Use tools (APIs, databases, code execution)
- Observe results and adapt
- Remember context across interactions
Unlike simple chatbots, agents can take autonomous actions to achieve goals.
The Scenario: Your company needs to automate competitor analysis. A simple prompt won’t work — you need an agent that searches websites, extracts pricing, compares features, and generates a report. That’s where architecture patterns matter.
Pattern 1: ReAct (Reasoning + Acting)
The ReAct pattern alternates between reasoning and action. It’s the simplest effective agent architecture.
How It Works
Thought: I need to check the weather in New York
Action: weather_api(location="New York")
Observation: {"temp": 72, "condition": "sunny"}
Thought: Now I have the weather data. The user asked about outdoor activities.
Action: search_activities(weather="sunny", location="New York")
Observation: ["Central Park", "High Line", "Brooklyn Bridge Walk"]
Final Answer: Based on sunny 72°F weather, I recommend Central Park, the High Line, or a Brooklyn Bridge walk.
Code Implementation
from typing import List, Dict, Any
import json
class ReActAgent:
def __init__(self, llm_client, tools: Dict[str, callable]):
self.llm = llm_client
self.tools = tools
self.max_iterations = 10
def run(self, query: str) -> str:
context = f"Query: {query}\n\n"
for i in range(self.max_iterations):
# Generate thought and action
prompt = self._build_prompt(context)
response = self.llm.generate(prompt)
# Parse response
thought = self._extract_thought(response)
action = self._extract_action(response)
if not action:
# Agent provided final answer
return response
# Execute tool
tool_name, tool_input = self._parse_action(action)
if tool_name in self.tools:
observation = self.tools[tool_name](**tool_input)
context += f"Thought: {thought}\n"
context += f"Action: {action}\n"
context += f"Observation: {observation}\n\n"
else:
context += f"Error: Tool '{tool_name}' not found\n\n"
return "Max iterations reached"
def _build_prompt(self, context: str) -> str:
return f"""You are a helpful assistant. Use the following format:
Thought: [Your reasoning about what to do next]
Action: [Tool name and JSON input, or "Final Answer"]
Available tools:
- weather_api(location: str)
- search_activities(weather: str, location: str)
- calculator(expression: str)
{context}
"""
When to Use ReAct
| Use Case | Example |
|---|---|
| Single-step decisions | ”Should I bring an umbrella?” |
| Sequential tool calls | Search → Filter → Summarize |
| Interactive debugging | Fix code errors step by step |
| Customer support | Diagnose issues through questioning |
Limitations
- No backtracking: Can’t revise earlier decisions
- Short horizon: Struggles with 10+ step tasks
- No parallel execution: Steps happen sequentially
Pattern 2: Plan-and-Execute
For complex tasks, plan everything first, then execute. This avoids mid-task dead ends.
How It Works
Plan:
1. Search for competitor pricing data
2. Extract pricing from top 3 results
3. Compare with our pricing
4. Generate analysis report
Execution:
[Execute step 1] → [Execute step 2] → [Execute step 3] → [Execute step 4]
Code Implementation
from dataclasses import dataclass
from typing import List, Optional
@dataclass
class Step:
description: str
tool: Optional[str]
input_params: Dict[str, Any]
output_var: str
class PlanAndExecuteAgent:
def __init__(self, llm_client, tools: Dict[str, callable]):
self.llm = llm_client
self.tools = tools
def run(self, query: str) -> str:
# Phase 1: Planning
plan = self._create_plan(query)
# Phase 2: Execution
results = {}
for step in plan:
if step.tool:
# Substitute variables from previous steps
params = self._substitute_vars(step.input_params, results)
results[step.output_var] = self.tools[step.tool](**params)
else:
# LLM reasoning step
results[step.output_var] = self._llm_reason(step.description, results)
return results.get('final_output', 'Task completed')
def _create_plan(self, query: str) -> List[Step]:
prompt = f"""Create a step-by-step plan for: {query}
Format each step as:
- Step: [description]
- Tool: [tool_name or "none"]
- Input: [params as JSON]
- Output: [variable_name]
Available tools: {list(self.tools.keys())}
"""
response = self.llm.generate(prompt)
return self._parse_plan(response)
def _substitute_vars(self, params: Dict, results: Dict) -> Dict:
"""Replace {{var}} with actual values from previous steps"""
resolved = {}
for key, value in params.items():
if isinstance(value, str) and value.startswith('{{') and value.endswith('}}'):
var_name = value[2:-2]
resolved[key] = results.get(var_name, value)
else:
resolved[key] = value
return resolved
When to Use Plan-and-Execute
| Use Case | Example |
|---|---|
| Multi-step workflows | Research → Draft → Review → Publish |
| Data pipelines | Extract → Transform → Load → Validate |
| Report generation | Gather data → Analyze → Visualize → Write |
| Code generation | Plan architecture → Generate files → Test |
Advantages Over ReAct
- Global optimization: Plans consider all steps upfront
- Parallel execution: Independent steps run simultaneously
- Better error recovery: Can replan from any failure point
Pattern 3: Multi-Agent Systems
Divide complex tasks between specialized agents. Each agent has a specific role and expertise.
Architecture Overview
┌─────────────────────────────────────────┐
│ Orchestrator Agent │
│ (Routes tasks, manages workflow) │
└─────────────────────────────────────────┘
│
┌──────┼──────┬──────┐
│ │ │ │
▼ ▼ ▼ ▼
┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐
│Research│ │Writer│ │Coder │ │Review│
│ Agent │ │ Agent│ │ Agent│ │ Agent│
└─────┘ └─────┘ └─────┘ └─────┘
Code Implementation
from typing import Callable
import asyncio
class Agent:
def __init__(self, name: str, system_prompt: str, tools: List[str]):
self.name = name
self.system_prompt = system_prompt
self.tools = tools
async def execute(self, task: str, context: Dict) -> str:
# Each agent uses ReAct or Plan-and-Execute internally
prompt = f"{self.system_prompt}\n\nTask: {task}\nContext: {context}"
return await self.llm.generate(prompt)
class MultiAgentSystem:
def __init__(self):
self.agents: Dict[str, Agent] = {}
self.orchestrator = None
def register_agent(self, agent: Agent):
self.agents[agent.name] = agent
async def run(self, query: str) -> str:
# Orchestrator decides which agents to call
plan = await self._orchestrate(query)
results = {}
for step in plan:
agent_name = step['agent']
task = step['task']
if agent_name in self.agents:
agent = self.agents[agent_name]
results[agent_name] = await agent.execute(task, results)
# Synthesize final output
return await self._synthesize(query, results)
async def _orchestrate(self, query: str) -> List[Dict]:
"""Determine which agents to use and in what order"""
orchestrator_prompt = f"""Given this task: {query}
Available agents:
{[f"- {name}: {agent.system_prompt[:100]}..." for name, agent in self.agents.items()]}
Create an execution plan:
1. Which agents to use
2. What task to give each
3. Dependencies between agents
Output as JSON list with 'agent', 'task', and 'depends_on' keys.
"""
response = await self.llm.generate(orchestrator_prompt)
return json.loads(response)
# Usage
system = MultiAgentSystem()
system.register_agent(Agent(
name="researcher",
system_prompt="You are a research specialist. Find accurate, up-to-date information.",
tools=["web_search", "academic_search", "news_api"]
))
system.register_agent(Agent(
name="writer",
system_prompt="You are a technical writer. Create clear, engaging content.",
tools=["grammar_check", "readability_score"]
))
system.register_agent(Agent(
name="coder",
system_prompt="You are a senior developer. Write clean, tested code.",
tools=["code_executor", "linter", "test_runner"]
))
result = await system.run("Create a Python script that fetches weather data and sends email alerts")
Agent Specialization Examples
| Agent Type | Responsibility | Tools |
|---|---|---|
| Research Agent | Information gathering | Search APIs, databases, web scraping |
| Analysis Agent | Data processing | Pandas, SQL, visualization |
| Code Agent | Implementation | Code execution, linters, tests |
| Review Agent | Quality assurance | Fact-checking, style guides |
| UI Agent | Interface design | Component libraries, design systems |
When to Use Multi-Agent
| Use Case | Why Multiple Agents? |
|---|---|
| Content platform | Research → Write → Edit → SEO optimize |
| DevOps automation | Monitor → Analyze → Plan → Execute |
| Customer support | Triage → Resolve → Escalate → Follow-up |
| Research assistant | Literature review → Analysis → Synthesis |
Pattern 4: Tool Use with Reflection
Don’t just call tools — verify the results before proceeding.
The Reflection Loop
Plan → Act → Observe → Reflect → [Retry if needed] → Continue
Code Implementation
@dataclass
class ToolResult:
success: bool
data: Any
error: Optional[str] = None
class ReflectiveToolAgent:
def __init__(self, llm_client, tools: Dict[str, callable]):
self.llm = llm_client
self.tools = tools
self.max_retries = 3
async def use_tool(self, tool_name: str, params: Dict) -> ToolResult:
for attempt in range(self.max_retries):
# Execute tool
try:
raw_result = await self.tools[tool_name](**params)
# Reflect on result
reflection = await self._reflect(tool_name, params, raw_result)
if reflection.is_valid:
return ToolResult(success=True, data=raw_result)
# Retry with corrections
params = reflection.corrected_params
except Exception as e:
if attempt == self.max_retries - 1:
return ToolResult(success=False, error=str(e))
return ToolResult(success=False, error="Max retries exceeded")
async def _reflect(self, tool_name: str, params: Dict, result: Any) -> 'Reflection':
"""Analyze if tool output is valid and useful"""
prompt = f"""Tool: {tool_name}
Input: {json.dumps(params)}
Output: {json.dumps(result)}
Evaluate:
1. Is the output valid and well-formed?
2. Does it contain the expected data?
3. Are there any errors or anomalies?
Respond with JSON:
{{
"is_valid": true/false,
"issues": ["list of problems if any"],
"corrected_params": {{"param": "value"}} // if retry needed
}}
"""
response = await self.llm.generate(prompt)
return Reflection.parse(response)
Reflection Checks
| Check | Example |
|---|---|
| Format validation | JSON parsing, schema validation |
| Semantic validation | ”Does this answer the user’s question?” |
| Error detection | Empty results, rate limits, timeouts |
| Quality assessment | ”Is this search result relevant?” |
Pattern 5: Memory Systems
Agents need to remember context across sessions and learn from past interactions.
Three Types of Memory
┌─────────────────────────────────────────────────────┐
│ WORKING MEMORY │
│ (Current conversation context) │
│ ~128K tokens (Claude/GPT) │
└─────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────┐
│ SHORT-TERM MEMORY │
│ (Recent conversations, session history) │
│ Vector DB: Pinecone, Chroma, Weaviate │
└─────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────┐
│ LONG-TERM MEMORY │
│ (User preferences, learned facts, entity graph) │
│ Knowledge Graph + Document Store │
└─────────────────────────────────────────────────────┘
Implementation
from typing import List
import hashlib
class AgentMemory:
def __init__(self, vector_store, knowledge_graph):
self.vector_store = vector_store
self.kg = knowledge_graph
self.session_id = None
def start_session(self, user_id: str):
self.session_id = hashlib.md5(f"{user_id}:{time.time()}".encode()).hexdigest()
async def remember(self, content: str, memory_type: str = "short_term"):
"""Store information for later retrieval"""
if memory_type == "short_term":
# Vector embedding for semantic search
embedding = await self.embed(content)
await self.vector_store.upsert(
ids=[f"{self.session_id}:{time.time()}"],
embeddings=[embedding],
metadatas=[{"content": content, "session": self.session_id}]
)
elif memory_type == "long_term":
# Extract entities and relationships
entities = await self._extract_entities(content)
for entity in entities:
await self.kg.add_entity(entity)
async def recall(self, query: str, k: int = 5) -> List[str]:
"""Retrieve relevant past information"""
# Semantic search
query_embedding = await self.embed(query)
results = await self.vector_store.query(
query_embeddings=[query_embedding],
n_results=k,
filter={"session": self.session_id}
)
return [r['content'] for r in results['metadatas'][0]]
async def _extract_entities(self, text: str) -> List[Dict]:
"""Use LLM to extract entities and relationships"""
prompt = f"""Extract entities and relationships from:
{text}
Format: JSON list of {{"entity": "name", "type": "person/place/thing", "relationships": [{{"to": "other", "type": "works_with/located_in/etc"}}]}}"""
response = await self.llm.generate(prompt)
return json.loads(response)
Memory Retrieval Strategies
| Strategy | Use Case |
|---|---|
| Semantic search | ”Find similar past conversations” |
| Entity lookup | ”What’s the user’s company?” |
| Temporal recall | ”What did we discuss last week?” |
| Structured query | ”List all API integrations mentioned” |
Choosing the Right Pattern
| Task Complexity | Recommended Pattern |
|---|---|
| Simple Q&A with tool use | ReAct |
| Multi-step workflow | Plan-and-Execute |
| Cross-functional automation | Multi-Agent |
| Critical operations (finance, health) | Tool Use + Reflection |
| Persistent user relationships | Any pattern + Memory |
Common Pitfalls
The Infinite Loop
# BAD: No iteration limit
while not task_complete:
agent.step()
# GOOD: Bounded execution
for i in range(max_iterations):
if task_complete:
break
agent.step()
The Context Explosion
# BAD: Unlimited context growth
context += f"Step {i}: {result}\n"
# GOOD: Summarize old context
if len(context) > 100000:
context = await agent.summarize(context)
The Tool Overload
# BAD: 50 tools confuses the agent
tools = [tool1, tool2, ..., tool50]
# GOOD: Group tools by function
research_tools = [search, scrape, summarize]
code_tools = [execute, lint, test]
Production Checklist
- Set maximum iteration limits
- Implement timeout handling
- Add cost tracking per request
- Log all tool calls for debugging
- Cache frequent tool results
- Implement graceful degradation
- Add human-in-the-loop for critical decisions
- Monitor hallucination rates
- A/B test different prompts
- Version control your agent configurations
Summary
- ReAct: Simple, effective for single-step reasoning. Think → Act → Observe.
- Plan-and-Execute: Complex workflows need upfront planning.
- Multi-Agent: Scale to enterprise by specializing agents.
- Tool Use + Reflection: Verify results, don’t blindly trust.
- Memory: Context across sessions separates toys from tools.
The best agents combine these patterns. Start with ReAct, add planning for complexity, specialize into multi-agent for scale, and always verify critical operations.
What to Read Next
- Building MCP Servers for Claude — Connect agents to any API
- Claude API Cheat Sheet — LLM parameters and tool use
- How to Add Claude to Your App — Integration patterns