Choosing LLMs for CrewAI Agents — GPT, Claude, Gemini, Open-Source

Why LLM Choice Matters

The LLM you choose directly determines your agent's capabilities — reasoning quality, speed, cost, and supported features like tool use and structured output. Choosing the wrong model leads to poor results or excessive costs.

Why this matters for your career:

LLM selection is a key skill for building effective AI agents
Cost optimization (choosing the right model for each task) saves 50-90%
Understanding model strengths helps you design better agent architectures
Multi-model strategies (using different models for different agents) maximize quality and efficiency

LLM Comparison

| Model | Reasoning | Coding | Creative Writing | Structured Output | Speed | Cost (per 1M tokens) | |-------|-----------|--------|-----------------|-------------------|-------|---------------------| | GPT-4o | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | Fast | $2.50 / $10.00 | | GPT-4o-mini | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | Very Fast | $0.15 / $0.60 | | Claude 3.5 Sonnet | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | Fast | $3.00 / $15.00 | | Claude 3 Haiku | ⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | Very Fast | $0.25 / $1.25 | | Gemini 1.5 Pro | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ | Fast | $1.25 / $5.00 | | Gemini 1.5 Flash | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ | Very Fast | $0.075 / $0.30 | | Llama 3.1 70B | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ | Medium | Free (self-hosted) | | Llama 3.1 8B | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐ | ⭐⭐ | Very Fast | Free (self-hosted) | | Mistral Large | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ | Fast | $2.00 / $6.00 | | Mistral Small | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ | Very Fast | $0.20 / $0.60 |

Matching Models to Agent Roles

🧠 Reasoning & Analysis Agent

Best model: Claude 3.5 Sonnet or GPT-4o

Use for agents that need deep reasoning, analysis, and decision-making:

Financial analysis agent
Data interpretation agent
Strategy formulation agent
Scientific research agent

from crewai import Agent

analyst = Agent(
    role='Senior Data Analyst',
    goal='Analyze complex datasets and provide actionable insights.',
    backstory='You are a world-class data analyst with expertise in statistical analysis.',
    llm='gpt-4o',  # Strong reasoning for complex analysis
    verbose=True
)

💻 Coding Agent

Best model: GPT-4o or Claude 3.5 Sonnet

Use for agents that write, review, or debug code:

Code generation agent
Code review agent
Test writing agent
Bug fixing agent

coder = Agent(
    role='Senior Software Engineer',
    goal='Write clean, efficient, well-tested code.',
    backstory='You are a senior engineer with expertise in full-stack development.',
    llm='gpt-4o',  # Best for code generation
    verbose=True
)

✍️ Creative Writing Agent

Best model: Claude 3.5 Sonnet

Use for agents that create content, markdown, or documentation:

Documentation writer
Content creator
Marketing copywriter
Tutorial generator

writer = Agent(
    role='Technical Writer',
    goal='Create clear, engaging documentation and tutorials.',
    backstory='You are an experienced technical writer specializing in developer documentation.',
    llm='claude-3-5-sonnet-20241022',  # Best for writing quality
    verbose=True
)

📋 Structured Output Agent

Best model: GPT-4o or Mistral Large

Use for agents that need consistent JSON, schema, or formatted output:

JSON formatter
Data extraction agent
API response generator
Report generator

formatter = Agent(
    role='Data Formatter',
    goal='Extract and structure data into JSON format.',
    backstory='You are a data processing expert. You always output valid JSON.',
    llm='gpt-4o',  # Excellent JSON mode
    verbose=True
)

Multi-Model Strategies

Use different models for different agents in the same crew:

# Use expensive model for critical reasoning
planner = Agent(llm='gpt-4o', ...)

# Use cheap model for simple tasks
researcher = Agent(llm='gpt-4o-mini', ...)

# Use open-source for sensitive data
local_processor = Agent(
    llm='ollama/llama3.1:70b',  # Runs locally, data never leaves your server
    ...
)

Cost Optimization Example

| Agent | Task Complexity | Model | Cost per 1000 Tasks | |-------|----------------|-------|-------------------| | Planner | High | GPT-4o | $2.50 | | Researcher | Medium | GPT-4o-mini | $0.15 | | Writer | Medium | Claude 3 Haiku | $0.25 | | Reviewer | High | GPT-4o | $2.50 | | Total | | | $5.40 |

Using GPT-4o for everything: $7.50 × 4 = $30.00 → 82% savings!

Self-Hosted Models

For data privacy or cost control, run open-source models locally:

# Using Ollama
ollama pull llama3.1:70b
ollama pull mistral:7b
ollama pull qwen2.5:32b

# In CrewAI
local_agent = Agent(
    llm='ollama/llama3.1:70b',
    ...
)

| Model | RAM Required | Speed | Quality | |-------|-------------|-------|--------| | Llama 3.1 8B | 8 GB | Very fast | Good for simple tasks | | Llama 3.1 70B | 48 GB | Medium | Excellent — close to GPT-4 | | Mistral 7B | 8 GB | Very fast | Good for structured output | | Qwen 2.5 32B | 24 GB | Fast | Very good for reasoning | | DeepSeek Coder V2 | 16 GB | Fast | Excellent for code tasks |

Summary

Choose your LLM based on the agent's task type. Use GPT-4o or Claude 3.5 for complex reasoning and coding. Use GPT-4o-mini or Claude Haiku for simple tasks to save costs. Use self-hosted models for data privacy. A multi-model strategy gives the best quality-to-cost ratio.

Key takeaways:

GPT-4o and Claude 3.5 Sonnet are best for reasoning, coding, and complex tasks
GPT-4o-mini and Claude Haiku are cost-effective for simple tasks
Gemini 1.5 Flash is the cheapest option with decent quality
Llama 3.1 70B is the best open-source model (close to GPT-4 quality)
Use a multi-model strategy: expensive models for critical agents, cheap for routine
Self-host models for data privacy or to avoid API costs
Consider latency and throughput requirements (cheaper models are usually faster)
Always benchmark with your specific use case before committing

What's Next: Advanced Prompting

The next chapter covers advanced prompting techniques for CrewAI agents — chain-of-thought, few-shot prompting, role prompting, and structured output formatting.

Setting the LLM in CrewAI

# Option 1: Use a string identifier
agent = Agent(
    llm='gpt-4o',
    ...
)

# Option 2: Use a ChatOpenAI instance from LangChain
from langchain_openai import ChatOpenAI

agent = Agent(
    llm=ChatOpenAI(
        model='gpt-4o-mini',
        temperature=0.3,
        max_tokens=4096
    ),
    ...
)

# Option 3: Use Ollama for local models
agent = Agent(
    llm='ollama/llama3.1:70b',
    ...
)

# Option 4: Use Anthropic Claude
agent = Agent(
    llm='claude-3-5-sonnet-20241022',
    ...
)

Recommendation Matrix

| Your Priority | Recommended Model | Runner Up | |--------------|-------------------|-----------| | Best quality (no budget limit) | Claude 3.5 Sonnet | GPT-4o | | Best value (quality per dollar) | GPT-4o-mini | Claude 3 Haiku | | Cheapest | Gemini 1.5 Flash | GPT-4o-mini | | Data privacy (self-hosted) | Llama 3.1 70B | Qwen 2.5 32B | | Fastest execution | GPT-4o-mini | Claude 3 Haiku | | Best for coding | GPT-4o | Claude 3.5 Sonnet | | Best for creative writing | Claude 3.5 Sonnet | GPT-4o | | Best JSON/structured output | GPT-4o | Mistral Large |

This matrix helps you quickly choose the right model based on your priorities.