Choosing LLMs for CrewAI Agents — GPT, Claude, Gemini, Open-Source
Why LLM Choice Matters
The LLM you choose directly determines your agent's capabilities — reasoning quality, speed, cost, and supported features like tool use and structured output. Choosing the wrong model leads to poor results or excessive costs.
Why this matters for your career:
- LLM selection is a key skill for building effective AI agents
- Cost optimization (choosing the right model for each task) saves 50-90%
- Understanding model strengths helps you design better agent architectures
- Multi-model strategies (using different models for different agents) maximize quality and efficiency
LLM Comparison
| Model | Reasoning | Coding | Creative Writing | Structured Output | Speed | Cost (per 1M tokens) | |-------|-----------|--------|-----------------|-------------------|-------|---------------------| | GPT-4o | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | Fast | $2.50 / $10.00 | | GPT-4o-mini | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | Very Fast | $0.15 / $0.60 | | Claude 3.5 Sonnet | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | Fast | $3.00 / $15.00 | | Claude 3 Haiku | ⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | Very Fast | $0.25 / $1.25 | | Gemini 1.5 Pro | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ | Fast | $1.25 / $5.00 | | Gemini 1.5 Flash | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ | Very Fast | $0.075 / $0.30 | | Llama 3.1 70B | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ | Medium | Free (self-hosted) | | Llama 3.1 8B | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐ | ⭐⭐ | Very Fast | Free (self-hosted) | | Mistral Large | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ | Fast | $2.00 / $6.00 | | Mistral Small | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ | Very Fast | $0.20 / $0.60 |
Matching Models to Agent Roles
🧠 Reasoning & Analysis Agent
Best model: Claude 3.5 Sonnet or GPT-4o
Use for agents that need deep reasoning, analysis, and decision-making:
- Financial analysis agent
- Data interpretation agent
- Strategy formulation agent
- Scientific research agent
from crewai import Agent
analyst = Agent(
role='Senior Data Analyst',
goal='Analyze complex datasets and provide actionable insights.',
backstory='You are a world-class data analyst with expertise in statistical analysis.',
llm='gpt-4o', # Strong reasoning for complex analysis
verbose=True
)
💻 Coding Agent
Best model: GPT-4o or Claude 3.5 Sonnet
Use for agents that write, review, or debug code:
- Code generation agent
- Code review agent
- Test writing agent
- Bug fixing agent
coder = Agent(
role='Senior Software Engineer',
goal='Write clean, efficient, well-tested code.',
backstory='You are a senior engineer with expertise in full-stack development.',
llm='gpt-4o', # Best for code generation
verbose=True
)
✍️ Creative Writing Agent
Best model: Claude 3.5 Sonnet
Use for agents that create content, markdown, or documentation:
- Documentation writer
- Content creator
- Marketing copywriter
- Tutorial generator
writer = Agent(
role='Technical Writer',
goal='Create clear, engaging documentation and tutorials.',
backstory='You are an experienced technical writer specializing in developer documentation.',
llm='claude-3-5-sonnet-20241022', # Best for writing quality
verbose=True
)
📋 Structured Output Agent
Best model: GPT-4o or Mistral Large
Use for agents that need consistent JSON, schema, or formatted output:
- JSON formatter
- Data extraction agent
- API response generator
- Report generator
formatter = Agent(
role='Data Formatter',
goal='Extract and structure data into JSON format.',
backstory='You are a data processing expert. You always output valid JSON.',
llm='gpt-4o', # Excellent JSON mode
verbose=True
)
Multi-Model Strategies
Use different models for different agents in the same crew:
# Use expensive model for critical reasoning
planner = Agent(llm='gpt-4o', ...)
# Use cheap model for simple tasks
researcher = Agent(llm='gpt-4o-mini', ...)
# Use open-source for sensitive data
local_processor = Agent(
llm='ollama/llama3.1:70b', # Runs locally, data never leaves your server
...
)
Cost Optimization Example
| Agent | Task Complexity | Model | Cost per 1000 Tasks | |-------|----------------|-------|-------------------| | Planner | High | GPT-4o | $2.50 | | Researcher | Medium | GPT-4o-mini | $0.15 | | Writer | Medium | Claude 3 Haiku | $0.25 | | Reviewer | High | GPT-4o | $2.50 | | Total | | | $5.40 |
Using GPT-4o for everything: $7.50 × 4 = $30.00 → 82% savings!
Self-Hosted Models
For data privacy or cost control, run open-source models locally:
# Using Ollama
ollama pull llama3.1:70b
ollama pull mistral:7b
ollama pull qwen2.5:32b
# In CrewAI
local_agent = Agent(
llm='ollama/llama3.1:70b',
...
)
| Model | RAM Required | Speed | Quality | |-------|-------------|-------|--------| | Llama 3.1 8B | 8 GB | Very fast | Good for simple tasks | | Llama 3.1 70B | 48 GB | Medium | Excellent — close to GPT-4 | | Mistral 7B | 8 GB | Very fast | Good for structured output | | Qwen 2.5 32B | 24 GB | Fast | Very good for reasoning | | DeepSeek Coder V2 | 16 GB | Fast | Excellent for code tasks |
Summary
Choose your LLM based on the agent's task type. Use GPT-4o or Claude 3.5 for complex reasoning and coding. Use GPT-4o-mini or Claude Haiku for simple tasks to save costs. Use self-hosted models for data privacy. A multi-model strategy gives the best quality-to-cost ratio.
Key takeaways:
- GPT-4o and Claude 3.5 Sonnet are best for reasoning, coding, and complex tasks
- GPT-4o-mini and Claude Haiku are cost-effective for simple tasks
- Gemini 1.5 Flash is the cheapest option with decent quality
- Llama 3.1 70B is the best open-source model (close to GPT-4 quality)
- Use a multi-model strategy: expensive models for critical agents, cheap for routine
- Self-host models for data privacy or to avoid API costs
- Consider latency and throughput requirements (cheaper models are usually faster)
- Always benchmark with your specific use case before committing
What's Next: Advanced Prompting
The next chapter covers advanced prompting techniques for CrewAI agents — chain-of-thought, few-shot prompting, role prompting, and structured output formatting.
Setting the LLM in CrewAI
# Option 1: Use a string identifier
agent = Agent(
llm='gpt-4o',
...
)
# Option 2: Use a ChatOpenAI instance from LangChain
from langchain_openai import ChatOpenAI
agent = Agent(
llm=ChatOpenAI(
model='gpt-4o-mini',
temperature=0.3,
max_tokens=4096
),
...
)
# Option 3: Use Ollama for local models
agent = Agent(
llm='ollama/llama3.1:70b',
...
)
# Option 4: Use Anthropic Claude
agent = Agent(
llm='claude-3-5-sonnet-20241022',
...
)
Recommendation Matrix
| Your Priority | Recommended Model | Runner Up | |--------------|-------------------|-----------| | Best quality (no budget limit) | Claude 3.5 Sonnet | GPT-4o | | Best value (quality per dollar) | GPT-4o-mini | Claude 3 Haiku | | Cheapest | Gemini 1.5 Flash | GPT-4o-mini | | Data privacy (self-hosted) | Llama 3.1 70B | Qwen 2.5 32B | | Fastest execution | GPT-4o-mini | Claude 3 Haiku | | Best for coding | GPT-4o | Claude 3.5 Sonnet | | Best for creative writing | Claude 3.5 Sonnet | GPT-4o | | Best JSON/structured output | GPT-4o | Mistral Large |
This matrix helps you quickly choose the right model based on your priorities.