Introduction
The landscape of AI development has shifted dramatically. We've moved beyond simple chatbots and text generation to a new paradigm: agentic AI—systems that can reason, plan, and autonomously use tools to accomplish complex tasks.
Imagine an AI that doesn't just answer questions about your calendar, but actually schedules meetings, sends emails, and updates your task list. Or a customer service agent that can check order status, process refunds, and update CRM records—all without human intervention.
This isn't science fiction. It's the reality of modern agentic AI systems, and in this guide, you'll learn how to build them from scratch.
We'll cover:
- What makes an AI "agentic" and why it matters
- The ReAct (Reasoning + Acting) pattern that powers modern agents
- Building your first tool-using agent
- Production-ready patterns and best practices
- Common pitfalls and how to avoid them
By the end, you'll have the knowledge to build AI agents that can autonomously solve real-world problems.
What is Agentic AI?
Agentic AI refers to AI systems that can:
- Reason about problems and break them into steps
- Make decisions about which actions to take
- Use tools to interact with external systems
- Observe the results and adapt their approach
- Persist until a goal is achieved or determined impossible
Unlike traditional chatbots that simply respond to prompts, agents actively pursue goals through multiple reasoning and action cycles.
The Key Difference: Passive vs. Active AI
Passive AI (Traditional LLM):
User: "What's the weather like?"
AI: "I don't have access to real-time weather data."
Agentic AI:
User: "What's the weather like?"
AI: [Thinks] I need weather data. I'll use the weather API tool.
[Acts] Calls get_weather(location="user_location")
[Observes] Receives: 72°F, sunny
[Responds] "It's currently 72°F and sunny in your area."
The agent doesn't just know it lacks information—it knows how to get it.
The ReAct Pattern: The Brain of an Agent
The ReAct (Reasoning and Acting) pattern is the foundation of modern agentic AI. It's a simple but powerful loop:
1. REASON: Think about the current state and what to do next
2. ACT: Execute a tool or provide a final answer
3. OBSERVE: See the results of the action
4. REPEAT: Continue until the goal is achieved
This pattern was introduced in the paper "ReAct: Synergizing Reasoning and Acting in Language Models" (Yao et al., 2022) and has become the de facto standard for agent architectures.
Why ReAct Works
LLMs are naturally good at reasoning through text. By making the agent's "thoughts" explicit, we get:
- Transparency: You can see why the agent made each decision
- Debuggability: When things go wrong, you know where
- Better performance: Explicit reasoning improves decision quality
- Controllability: You can guide the agent's thinking process
Building Your First AI Agent
Let's build a practical AI agent that can search the web and perform calculations. We'll use Python with the Anthropic Claude API, but the patterns apply to any modern LLM.
Step 1: Define Your Tools
Tools are functions your agent can call. Each tool needs:
- A clear name
- A description (this is crucial—the LLM reads this!)
- Input parameters with types
- The actual implementation
import requests
from typing import Literal
def search_web(query: str) -> str:
"""
Search the web for information.
Args:
query: The search query to look up
Returns:
Search results as a formatted string
"""
# In production, use a real search API (Serper, Tavily, etc.)
response = requests.get(
"https://api.search-service.com/search",
params={"q": query}
)
return response.json()["results"]
def calculate(expression: str) -> float:
"""
Safely evaluate a mathematical expression.
Args:
expression: A mathematical expression like "25 * 4 + 10"
Returns:
The calculated result
"""
# Use a safe evaluator in production (no eval()!)
import ast
import operator
operators = {
ast.Add: operator.add,
ast.Sub: operator.sub,
ast.Mult: operator.mul,
ast.Div: operator.truediv,
}
# Simplified—use a proper math parser in production
return eval(expression) # Don't actually do this!
# Tool definitions for the LLM
tools = [
{
"name": "search_web",
"description": "Search the web for current information. Use this when you need facts, news, or real-time data.",
"input_schema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "The search query"
}
},
"required": ["query"]
}
},
{
"name": "calculate",
"description": "Perform mathematical calculations. Use this for any arithmetic operations.",
"input_schema": {
"type": "object",
"properties": {
"expression": {
"type": "string",
"description": "Mathematical expression to evaluate"
}
},
"required": ["expression"]
}
}
]
Step 2: Implement the Agent Loop
This is where the magic happens. The agent loop orchestrates the ReAct cycle:
import anthropic
import json
class AIAgent:
def __init__(self, api_key: str):
self.client = anthropic.Anthropic(api_key=api_key)
self.tools = tools
self.tool_functions = {
"search_web": search_web,
"calculate": calculate
}
def run(self, user_message: str, max_iterations: int = 10):
"""
Run the agent loop until completion or max iterations reached.
"""
messages = [{"role": "user", "content": user_message}]
for iteration in range(max_iterations):
# Get response from Claude
response = self.client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
tools=self.tools,
messages=messages
)
# Check if we're done
if response.stop_reason == "end_turn":
# Extract final answer
final_response = next(
(block.text for block in response.content
if hasattr(block, "text")),
None
)
return final_response
# Claude wants to use a tool
if response.stop_reason == "tool_use":
# Add Claude's response to conversation
messages.append({
"role": "assistant",
"content": response.content
})
# Execute all requested tools
tool_results = []
for block in response.content:
if block.type == "tool_use":
tool_name = block.name
tool_input = block.input
print(f"🔧 Using tool: {tool_name}")
print(f"📥 Input: {tool_input}")
# Execute the tool
result = self.tool_functions[tool_name](**tool_input)
print(f"📤 Result: {result}\n")
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": str(result)
})
# Add tool results to conversation
messages.append({
"role": "user",
"content": tool_results
})
return "Max iterations reached without completion."
# Usage
agent = AIAgent(api_key="your-api-key")
response = agent.run("What's 15% of the current price of Bitcoin?")
print(response)
Step 3: Watch It Work
When you run this agent with a query like "What's 15% of the current price of Bitcoin?", here's what happens:
Iteration 1:
🔧 Using tool: search_web
📥 Input: {'query': 'current bitcoin price'}
📤 Result: Bitcoin is currently trading at $67,234
Iteration 2:
🔧 Using tool: calculate
📥 Input: {'expression': '67234 * 0.15'}
📤 Result: 10085.1
Final Response: "15% of the current Bitcoin price ($67,234) is $10,085.10"
The agent:
- Reasoned it needed current Bitcoin price
- Used the search tool
- Reasoned it needed to calculate 15%
- Used the calculator tool
- Provided the final answer
Advanced Patterns for Production Systems
Building a demo agent is one thing. Building a production-ready system is another. Here are the patterns you need.
1. Tool Selection Strategies
Not all tools should always be available. Smart agents dynamically enable tools based on context:
class ContextAwareAgent(AIAgent):
def get_tools_for_context(self, user_message: str) -> list:
"""
Dynamically select relevant tools based on the query.
"""
all_tools = {
"search": ["search_web", "search_knowledge_base"],
"data": ["query_database", "calculate"],
"communication": ["send_email", "create_calendar_event"],
"files": ["read_file", "write_file", "list_directory"]
}
# Use an LLM or keyword matching to categorize the query
categories = self.categorize_query(user_message)
# Only include relevant tools
enabled_tools = []
for category in categories:
enabled_tools.extend(all_tools.get(category, []))
return [t for t in self.tools if t["name"] in enabled_tools]
Why this matters: Giving an agent too many tools confuses it and wastes tokens. A focused tool set improves accuracy and reduces costs.
2. Error Handling and Retries
Tools fail. Networks timeout. APIs return errors. Production agents must handle this gracefully:
def execute_tool_with_retry(
self,
tool_name: str,
tool_input: dict,
max_retries: int = 3
) -> dict:
"""
Execute a tool with exponential backoff retry logic.
"""
for attempt in range(max_retries):
try:
result = self.tool_functions[tool_name](**tool_input)
return {
"success": True,
"result": result
}
except requests.exceptions.Timeout:
if attempt == max_retries - 1:
return {
"success": False,
"error": "Tool execution timed out after 3 attempts"
}
time.sleep(2 ** attempt) # Exponential backoff
except Exception as e:
return {
"success": False,
"error": f"Tool execution failed: {str(e)}"
}
3. Memory and State Management
Agents often need to remember things across turns. Implement a simple memory system:
class StatefulAgent(AIAgent):
def __init__(self, api_key: str):
super().__init__(api_key)
self.memory = {
"facts": [], # Things learned
"history": [], # Past actions
"context": {} # Current context
}
def add_to_memory(self, key: str, value: any):
"""Store information in agent memory."""
self.memory["facts"].append({
"key": key,
"value": value,
"timestamp": datetime.now()
})
def build_system_prompt(self) -> str:
"""Include memory in the system prompt."""
prompt = "You are a helpful AI assistant.\n\n"
if self.memory["facts"]:
prompt += "Things you know:\n"
for fact in self.memory["facts"][-5:]: # Last 5 facts
prompt += f"- {fact['key']}: {fact['value']}\n"
return prompt
4. Guardrails and Safety
Never let an agent run wild. Implement safety checks:
class SafeAgent(AIAgent):
DANGEROUS_ACTIONS = ["delete_database", "send_all_emails", "charge_card"]
def requires_confirmation(self, tool_name: str, tool_input: dict) -> bool:
"""Check if an action requires human confirmation."""
if tool_name in self.DANGEROUS_ACTIONS:
return True
# Check for high-impact parameters
if tool_name == "send_email" and len(tool_input.get("recipients", [])) > 10:
return True
if tool_name == "transfer_funds" and tool_input.get("amount", 0) > 1000:
return True
return False
def execute_tool(self, tool_name: str, tool_input: dict):
"""Execute tool with safety checks."""
if self.requires_confirmation(tool_name, tool_input):
print(f"⚠️ Confirmation required for: {tool_name}")
print(f"Parameters: {tool_input}")
confirmation = input("Proceed? (yes/no): ")
if confirmation.lower() != "yes":
return {"error": "Action cancelled by user"}
return super().execute_tool(tool_name, tool_input)
5. Observability and Logging
You can't fix what you can't see. Log everything:
import logging
from datetime import datetime
class ObservableAgent(AIAgent):
def __init__(self, api_key: str):
super().__init__(api_key)
self.setup_logging()
def setup_logging(self):
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler(f'agent_{datetime.now().strftime("%Y%m%d")}.log'),
logging.StreamHandler()
]
)
self.logger = logging.getLogger('AIAgent')
def run(self, user_message: str, max_iterations: int = 10):
self.logger.info(f"Starting agent run: {user_message}")
try:
result = super().run(user_message, max_iterations)
self.logger.info(f"Agent completed successfully")
return result
except Exception as e:
self.logger.error(f"Agent failed: {str(e)}", exc_info=True)
raise
Real-World Use Cases
Here are proven patterns for common agent applications:
Customer Support Agent
tools = [
{
"name": "check_order_status",
"description": "Look up the current status of a customer order",
"input_schema": {
"type": "object",
"properties": {
"order_id": {"type": "string"}
}
}
},
{
"name": "process_refund",
"description": "Initiate a refund for an order",
"input_schema": {
"type": "object",
"properties": {
"order_id": {"type": "string"},
"reason": {"type": "string"}
}
}
},
{
"name": "update_ticket",
"description": "Update the customer support ticket",
"input_schema": {
"type": "object",
"properties": {
"ticket_id": {"type": "string"},
"status": {"type": "string"},
"notes": {"type": "string"}
}
}
}
]
Research Assistant
tools = [
{
"name": "search_papers",
"description": "Search academic papers on arXiv or Google Scholar",
"input_schema": {
"type": "object",
"properties": {
"query": {"type": "string"},
"max_results": {"type": "integer"}
}
}
},
{
"name": "read_paper",
"description": "Extract and summarize content from a research paper",
"input_schema": {
"type": "object",
"properties": {
"paper_url": {"type": "string"}
}
}
},
{
"name": "save_note",
"description": "Save research notes to the knowledge base",
"input_schema": {
"type": "object",
"properties": {
"title": {"type": "string"},
"content": {"type": "string"},
"tags": {"type": "array", "items": {"type": "string"}}
}
}
}
]
Data Analysis Agent
tools = [
{
"name": "query_database",
"description": "Execute a SQL query on the analytics database",
"input_schema": {
"type": "object",
"properties": {
"query": {"type": "string"}
}
}
},
{
"name": "create_visualization",
"description": "Generate a chart or graph from data",
"input_schema": {
"type": "object",
"properties": {
"data": {"type": "array"},
"chart_type": {"type": "string"},
"title": {"type": "string"}
}
}
},
{
"name": "run_statistical_test",
"description": "Perform statistical analysis on datasets",
"input_schema": {
"type": "object",
"properties": {
"test_type": {"type": "string"},
"data": {"type": "array"}
}
}
}
]
Common Mistakes and How to Avoid Them
❌ Mistake 1: Vague Tool Descriptions
Bad:
{
"name": "get_data",
"description": "Gets data"
}
Good:
{
"name": "get_user_profile",
"description": "Retrieves a user's profile information including name, email, preferences, and account status. Use this when you need to look up details about a specific user."
}
Why: The LLM decides which tool to use based solely on descriptions. Be specific about what the tool does and when to use it.
❌ Mistake 2: No Maximum Iterations
Agents can get stuck in loops. Always set a max iteration count:
# Bad
while not done:
response = agent.step()
# Good
MAX_ITERATIONS = 15
for i in range(MAX_ITERATIONS):
response = agent.step()
if response.is_complete:
break
❌ Mistake 3: Ignoring Tool Execution Failures
Bad:
result = tool_function(**params)
# Assume it worked
Good:
try:
result = tool_function(**params)
return {"success": True, "data": result}
except Exception as e:
return {"success": False, "error": str(e)}
Pass errors back to the agent so it can try alternative approaches.
❌ Mistake 4: Unlimited Tool Access
Don't give your agent every tool for every task:
# Bad: 50 tools available for a simple calendar query
agent.run("What meetings do I have today?", tools=ALL_TOOLS)
# Good: Only relevant tools
relevant_tools = ["get_calendar_events", "get_current_time"]
agent.run("What meetings do I have today?", tools=relevant_tools)
❌ Mistake 5: Not Testing Edge Cases
Always test:
- Tool failures and network errors
- Ambiguous user requests
- Multi-step workflows
- Concurrent tool calls
- Maximum iteration limits
- Missing required parameters
🚀 Pro Tips
Tip 1: Use Structured Output for Tool Results
Don't return raw text from tools. Use JSON or structured data:
# Instead of: "User John Doe, email john@example.com, subscribed"
# Return:
{
"user_id": "12345",
"name": "John Doe",
"email": "john@example.com",
"status": "subscribed",
"created_at": "2026-04-14T10:30:00Z"
}
This makes it easier for the agent to extract and use specific information.
Tip 2: Implement Tool Caching
Avoid redundant API calls:
class CachingAgent(AIAgent):
def __init__(self, api_key: str):
super().__init__(api_key)
self.cache = {}
self.cache_ttl = 300 # 5 minutes
def execute_tool(self, tool_name: str, tool_input: dict):
cache_key = f"{tool_name}:{json.dumps(tool_input, sort_keys=True)}"
if cache_key in self.cache:
cached_result, timestamp = self.cache[cache_key]
if time.time() - timestamp < self.cache_ttl:
return cached_result
result = super().execute_tool(tool_name, tool_input)
self.cache[cache_key] = (result, time.time())
return result
Tip 3: Add Thinking Time Budgets
Prevent agents from overthinking simple tasks:
def run(self, user_message: str, max_tokens: int = 4096):
"""
Limit the total thinking tokens to prevent excessive reasoning.
"""
# For simple queries, use fewer tokens
if self.is_simple_query(user_message):
max_tokens = 1024
response = self.client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=max_tokens,
tools=self.tools,
messages=messages
)
Tip 4: Use Parallel Tool Calls
Modern LLMs can call multiple tools simultaneously:
# Claude might return multiple tool_use blocks
tool_blocks = [b for b in response.content if b.type == "tool_use"]
# Execute them in parallel for speed
import concurrent.futures
with concurrent.futures.ThreadPoolExecutor() as executor:
futures = [
executor.submit(self.execute_tool, block.name, block.input)
for block in tool_blocks
]
results = [f.result() for f in futures]
Tip 5: Monitor Token Usage
Track costs in production:
class BudgetAwareAgent(AIAgent):
def __init__(self, api_key: str, max_cost_per_run: float = 0.50):
super().__init__(api_key)
self.max_cost_per_run = max_cost_per_run
self.current_run_cost = 0
def run(self, user_message: str, max_iterations: int = 10):
self.current_run_cost = 0
for iteration in range(max_iterations):
response = self.client.messages.create(...)
# Calculate cost (example rates)
input_cost = response.usage.input_tokens * 0.000003
output_cost = response.usage.output_tokens * 0.000015
iteration_cost = input_cost + output_cost
self.current_run_cost += iteration_cost
if self.current_run_cost > self.max_cost_per_run:
return "Cost limit exceeded for this request."
# Continue agent loop...
Choosing the Right LLM for Your Agent
Not all LLMs are created equal for agentic tasks. Here's a comparison based on 2026 capabilities:
GPT-4 Turbo
- Strengths: Excellent reasoning, large context window
- Weaknesses: Higher cost, slower response times
- Best for: Complex multi-step reasoning, research tasks
Claude 3.5 Sonnet
- Strengths: Best-in-class tool use, fast, cost-effective
- Weaknesses: Smaller context than GPT-4 Turbo
- Best for: Production agents, customer support, data analysis
Claude 3 Opus
- Strengths: Top-tier reasoning, handles ambiguity well
- Weaknesses: Higher cost, slower
- Best for: Critical decision-making, complex problem-solving
Gemini 1.5 Pro
- Strengths: Massive context window (1M+ tokens), multimodal
- Weaknesses: Tool use less mature
- Best for: Document analysis, long conversation agents
Pro tip: Start with Claude 3.5 Sonnet for most production use cases. It offers the best balance of performance, cost, and tool-use capability.
Testing Your Agent
Rigorous testing is essential. Here's a testing framework:
class AgentTester:
def __init__(self, agent: AIAgent):
self.agent = agent
self.test_results = []
def test_case(self, name: str, query: str, expected_tools: list,
should_succeed: bool = True):
"""
Run a single test case.
"""
print(f"\n🧪 Testing: {name}")
result = self.agent.run(query)
tools_used = self.agent.get_tools_used() # Track this in your agent
success = (
(should_succeed and result is not None) and
all(tool in tools_used for tool in expected_tools)
)
self.test_results.append({
"name": name,
"success": success,
"tools_used": tools_used,
"result": result
})
status = "✅ PASS" if success else "❌ FAIL"
print(f"{status} - Used tools: {tools_used}")
return success
def run_test_suite(self):
"""
Run a comprehensive test suite.
"""
# Basic functionality
self.test_case(
"Simple search query",
"What is the capital of France?",
expected_tools=["search_web"]
)
# Multi-step reasoning
self.test_case(
"Calculation with search",
"What's 20% of the population of Tokyo?",
expected_tools=["search_web", "calculate"]
)
# Error handling
self.test_case(
"Invalid tool parameters",
"Calculate the square root of negative one",
expected_tools=["calculate"],
should_succeed=False
)
# Edge cases
self.test_case(
"Ambiguous request",
"Tell me about Apple",
expected_tools=["search_web"] # Should clarify: fruit or company?
)
# Report results
passed = sum(1 for r in self.test_results if r["success"])
total = len(self.test_results)
print(f"\n📊 Test Results: {passed}/{total} passed")
return self.test_results
# Usage
tester = AgentTester(agent)
results = tester.run_test_suite()
Deployment Considerations
Scaling Your Agent
Stateless Design:
# Don't store state in the agent instance
class BadAgent:
def __init__(self):
self.conversation_history = [] # ❌ Not thread-safe
# Do pass state explicitly
class GoodAgent:
def run(self, user_message: str, conversation_history: list):
# ✅ State passed in, agent is stateless
pass
Async Execution:
import asyncio
from anthropic import AsyncAnthropic
class AsyncAgent:
def __init__(self, api_key: str):
self.client = AsyncAnthropic(api_key=api_key)
async def run(self, user_message: str):
response = await self.client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
tools=self.tools,
messages=[{"role": "user", "content": user_message}]
)
return response
# Handle multiple requests concurrently
async def handle_batch(queries: list):
agent = AsyncAgent(api_key="...")
results = await asyncio.gather(*[agent.run(q) for q in queries])
return results
Monitoring in Production
Essential metrics to track:
- Success Rate: % of tasks completed successfully
- Average Tool Calls: Number of tools used per request
- Latency: Time from request to response (p50, p95, p99)
- Cost per Request: Track token usage and API costs
- Error Rate: % of requests that fail
- Tool-Specific Metrics: Success rate per tool
import datadog # or your monitoring service
class MonitoredAgent(AIAgent):
def run(self, user_message: str):
start_time = time.time()
try:
result = super().run(user_message)
# Record success
datadog.statsd.increment('agent.requests.success')
datadog.statsd.histogram(
'agent.latency',
time.time() - start_time
)
return result
except Exception as e:
# Record failure
datadog.statsd.increment('agent.requests.failure')
datadog.statsd.increment(f'agent.errors.{type(e).__name__}')
raise
📌 Key Takeaways
-
Agentic AI = Reasoning + Acting: Agents don't just respond—they think, plan, and execute actions to achieve goals.
-
The ReAct Pattern is Your Foundation: Reason → Act → Observe → Repeat is the core loop that powers modern agents.
-
Tool Design Matters: Clear, specific tool descriptions are critical. The agent only knows what you tell it.
-
Safety First: Implement guardrails, confirmation flows, and monitoring before deploying to production.
-
Start Simple, Then Scale: Build a basic agent first. Add complexity (memory, multi-step planning, parallel execution) only when needed.
-
Test Relentlessly: Edge cases, failures, and ambiguous inputs will happen. Test for them.
-
Monitor Everything: You can't improve what you don't measure. Track success rates, costs, and latency.
-
Choose Your LLM Wisely: Claude 3.5 Sonnet offers the best balance for most production agents, but evaluate based on your specific needs.
-
Async = Scalability: Use async patterns for production deployments to handle concurrent requests efficiently.
-
The Future is Agentic: We're moving from chatbots to autonomous AI systems. Learning to build agents now puts you ahead of the curve.
Conclusion
Agentic AI represents a fundamental shift in how we build AI systems. Instead of passive question-answering, we now have AI that can autonomously plan, reason, and execute complex tasks using tools.
The patterns and practices in this guide give you everything you need to build production-ready AI agents. Start with a simple agent, test thoroughly, add safety guardrails, and deploy with monitoring.
The future of AI isn't just about smarter models—it's about AI that can take action in the real world. And now you know how to build it.
What will you build?
Further Reading
- ReAct Paper: Synergizing Reasoning and Acting in Language Models
- Anthropic Tool Use Documentation
- OpenAI Function Calling Guide
- LangChain Agents Documentation
- AutoGPT: An Autonomous AI Agent Framework
Tools and Libraries
- LangChain: Framework for building LLM applications
- LlamaIndex: Data framework for LLM applications
- Anthropic Python SDK: Official Claude API client
- OpenAI Python SDK: Official GPT API client
- Haystack: Open-source NLP framework with agent support