🤖 Chapter 11: CrewAI Multi-Agent Crawler: Fully Automated Collection of Taiwan Campgrounds

In the previous basic chapters, we learned how to place a map on the screen and mark coordinates. But if you are an entrepreneur planning to build a "most complete Taiwan RV camping map," you can't possibly sit in front of your computer every day, manually Google "campgrounds," copy-paste addresses, latitudes/longitudes, and prices one by one, and manually create records. If you do that, by the time you finish entering 500 records, your competitors will have already saturated the market!

In the AI era, the most expensive resource is no longer "programming skills," but "clean, structured, high-quality data." This lesson will take you into the deepest part of Vibe Coding—automated data collection. We won't write traditional Python crawlers (BeautifulSoup / Scrapy) because they break too easily when websites change. We will introduce the hottest multi-agent framework in the world right now: CrewAI.

This is a deep-dive chapter of over 6000 words. Prepare your coffee; we will go step by step from principles, Prompt design, fail-safe mechanisms, to final structured output, teaching you how to build an "indefatigable AI robot army."


🏭 What is CrewAI? Why Are Traditional Crawlers Dead?

In the past, the logic of writing a crawler was: "Go to this URL -> Find the HTML tag with ID 'title' -> Extract the text inside." But nowadays, almost all websites are dynamic pages built with React/Vue, and some even have anti-crawling mechanisms (Cloudflare). If the website's HTML changes a single class name, your crawler immediately crashes and throws errors, forcing you to spend a whole day fixing bugs.

CrewAI's Revolutionary Thinking: We no longer rely on rigid HTML tags. We build a team of "AI employees (Crew)."

  • Agent A (Search Expert): Responsible for searching Google for "Nantou recommended campgrounds" and clicking on the top ten articles to read.
  • Agent B (Data Extraction Expert): Responsible for taking the lengthy articles read by Agent A, using an LLM to understand them, and accurately extracting "campground name, address, price range."
  • Agent C (Coordinate Conversion Expert): Responsible for sending the address to the Google Maps API and converting it into precise latitude/longitude.

You only need to issue high-level "human instructions." This group of AI will meet on their own, assign tasks, handle exceptions, and finally hand you a perfect JSON file!


🛠️ Practice 1: Create Your First AI Employee (Agent)

Before using CrewAI, you must clearly define each AI employee's Role, Goal, and Backstory. This is like interviewing an employee—the more detailed your settings, the more accurate their output.

💡 Vibe Prompt Practice 1: Define a "Data Search Expert"

If you don't know how to write Agent settings, just ask a large language model to write it for you!

[!IMPORTANT] Copy the following Prompt and send it to AI (e.g., Claude 3.5 or ChatGPT-4o):

I am developing an automated campground data collection system using Python's CrewAI framework. Please help me define the first Agent (AI employee). His task is: to search the internet for campground recommendation articles on major Taiwan forums (such as PTT, Dcard, Mobile01) and blogs. Please give me the Python code for this Agent, which must include: 1. role (role name) 2. goal (ultimate goal) 3. backstory (background story, used to give the AI personality and expertise) 4. verbose=True (enable detailed logging) 5. allow_delegation=False (do not allow delegating work to others) Write the backstory in a professional and vivid tone.

🤖 AI-Generated Perfect Agent Setup:

from crewai import Agent

# Create a data search expert
search_expert = Agent(
    role='Senior Camping Secret Spot Intelligence Officer',
    goal='To dig out the latest, most popular, and even the most hidden campground information across Taiwan from the vast internet, especially hidden spots suitable for "car camping."',
    backstory=(
        'You are an outdoor enthusiast with 15 years of camping experience, a senior player, and also a top-notch internet intelligence analyst. '
        'You are familiar with the jargon of major outdoor forums, knowing how to spot which articles on PTT\'s Camping board, Dcard\'s Travel board, and countless camping bloggers are sponsored posts and which are truly high-quality campgrounds. '
        'Your intuition is sharp; you never miss a camping paradise with a stunning starry sky or clean bathrooms.'
    ),
    verbose=True,
    allow_delegation=False,
    # Here we assume you have configured a tool that can search the internet (e.g., SerperDevTool)
    # tools=[search_tool] 
)

🔍 Deep Analysis: See? This is where CrewAI shines. You are not writing "if/else" logic; you are "giving a soul." When you give search_expert such a backstory, when he searches and reads web pages, the large language model automatically switches to the perspective of a "senior camping enthusiast," automatically filtering out information that looks like scams or low quality. This is absolutely impossible to achieve with traditional hard-coded Python crawlers!


🎯 Practice 2: Assign Specific Tasks to AI (Task)

Once you have employees, you need to assign specific work tasks (Tasks) to them. Task definitions also need to be extremely precise, telling them "what to do" and "what the expected output format is."

💡 Vibe Prompt Practice 2: Define a Precise Crawling Task

[!IMPORTANT] Copy the following Prompt and send it to AI:

I already have an Agent named search_expert. Now please help me write a specific task for him using CrewAI's Task class. Task requirements: 1. description: Ask him to search for the top 5 car camping campgrounds in "Hsinchu County." Must include detailed address, price, and one biggest feature (e.g., sea of clouds, fireflies). 2. expected_output: Require him to output a structured list. 3. agent: Bind to search_expert. Please give me this Python code, and explain the importance of expected_output in comments.

🤖 AI's Advanced Task Planning:

from crewai import Task

# Assign task: Hsinchu car camping carpet search
hsinchu_camping_task = Task(
    description=(
        'Please use the search tool to find the most popular and highest-rated 5 car camping/campground locations in "Hsinchu County." '
        'You must read relevant blog articles or forum reviews and extract the following information: '
        '1. Full campground name '
        '2. Complete address (at least township/district) '
        '3. Price range (e.g., NT$1000~1200 per tent) '
        '4. Biggest feature of the campground (within 20 characters, e.g., limited autumn/winter sea of clouds)'
    ),
    expected_output=(
        'A Markdown bulleted list containing 5 Hsinchu campgrounds. '
        'Each campground must strictly follow this format: '
        '- Name: [campground name] '
        '- Address: [full address] '
        '- Price: [price] '
        '- Feature: [feature]'
    ),
    agent=search_expert
)

/*
【AI Deep Analysis of the Importance of expected_output】:
In large language models (LLMs), if you only provide a description, the AI's final answer might be a lengthy, verbose "essay" filled with fluff.
`expected_output` is the "tightening spell" that the CrewAI framework uses to constrain the AI's output.
When you strictly define the format (such as a Markdown list or JSON format), the AI will be forced to condense and format its long-winded thinking to meet this condition.
This is a life-or-death step for us to later insert the data into a database!
*/

🔄 Practice 3: The Hardest Step – Geocoding

In a map system, having only an "address" is useless. The Leaflet.js map only understands latitude and longitude. If we throw "Hsinchu County, Wufeng Township, Taoshan Village..." at the map, it will error out.

Therefore, we need to create a second AI employee: Geographic Coordinate Conversion Expert. But he cannot rely solely on the LLM's brain to guess coordinates (this is called hallucination; AI-guessed coordinates often land you in the middle of the Pacific Ocean). He must learn to use real tools (Tools) to call the Google Maps API.

💡 Vibe Prompt Practice 3: Teach AI How to Use External API Tools

This is the core of advanced Vibe Coding: we don't write the API call ourselves; we write a toolbox and let the AI call it itself!

[!IMPORTANT] Copy the following Prompt and send it to AI:

In CrewAI, I need to create a custom tool that allows my Agent to convert an address into latitude/longitude. Please help me write a function geocode_address(address: str) using Python's @tool decorator. Logic: 1. Use the requests library to call the Google Maps Geocoding API. 2. The API Key will be stored in the environment variable GOOGLE_MAPS_API_KEY. 3. If conversion succeeds, return a dictionary string containing lat and lng. 4. If it fails or the address is not found, return an error prompt string. Please give me the complete code with comments, and emphasize the importance of the docstring.

🤖 AI-Generated Invincible Tool Function:

import os
import requests
from langchain.tools import tool

# 🔥 This is the soul! The docstring (comment below the function) is written for the AI!
# The AI will read this docstring to decide when to use this tool and what parameters to pass.
@tool("Address to Coordinate Conversion Tool")
def geocode_address(address: str) -> str:
    """
    This is a powerful geocoding tool.
    Use this tool when you need to know the precise latitude and longitude of a location or address.
    The input parameter must be a clear string address, e.g., "No.1, Shifu Rd., Xinyi Dist., Taipei City" or "Wuling Farm."
    """
    api_key = os.environ.get("GOOGLE_MAPS_API_KEY")
    if not api_key:
        return "Error: GOOGLE_MAPS_API_KEY environment variable not found."

    # Google Maps Geocoding API endpoint
    url = f"https://maps.googleapis.com/maps/api/geocode/json?address={address}&key={api_key}&language=zh-TW"
    
    try:
        response = requests.get(url)
        data = response.json()
        
        # Check if the API returned data successfully
        if data['status'] == 'OK':
            location = data['results'][0]['geometry']['location']
            lat = location['lat']
            lng = location['lng']
            # Return a clear string result to the AI employee
            return f"The latitude and longitude of address '{address}' are: Latitude {lat}, Longitude {lng}"
        else:
            return f"Cannot find latitude/longitude for '{address}'. API response status: {data['status']}"
            
    except Exception as e:
        return f"Network error occurred while calling API: {str(e)}"

🔍 Deep Analysis: Most beginners using LangChain or CrewAI's @tool forget to write the """docstring""". Remember, in the world of AI Agents, function comments are not written for humans; they are written as instruction manuals for the AI! If you don't tell the AI that this tool is used for "converting coordinates," when the AI gets an address, it will be dumbfounded, not knowing it actually has this powerful tool at its disposal. Now we can assign this @tool to the second Agent.


🧩 Practice 4: Chain Upstream and Downstream: Let AI Employees Work in Relay

We now have a "Search Expert (gathers data)" and a "Coordinate Conversion Expert (uses tools to convert coordinates)." In a real company, this is called an assembly line. Agent A finishes the report and must hand it over to Agent B for processing.

💡 Vibe Prompt Practice 4: Create a Crew Team and JSON Structure Output

What we ultimately need is not a Markdown article; we need a JSON format that can be directly inserted into a Supabase database (e.g., [{"name": "...", "lat": 24.1, "lng": 121.2}]).

[!IMPORTANT] Copy the following Prompt and send it to AI:

Please help me integrate the above processes and create a complete CrewAI execution script. 1. Create geocode_expert (Coordinate Conversion Expert) and assign the geocode_address tool to him. 2. Create a second task (format_to_json_task): geocode_expert receives the list of 5 campgrounds organized by the first task (hsinchu_camping_task). 3. He must use the tool to query coordinates one by one, and finally integrate all data into a standard JSON Array format. 4. Create a Crew and execute these two tasks in order. 5. Please give me the complete Python code and show how to launch this team.

🤖 AI Automated Team Assembly:

from crewai import Agent, Task, Crew, Process
import json

# 1. Create the second employee: Coordinate Conversion and Data Cleaning Expert
geocode_expert = Agent(
    role='Senior Geographic Data Engineer',
    goal='To accurately convert human-readable addresses into latitude/longitude needed by map systems, and output perfect JSON structures.',
    backstory='You have severe OCD; you cannot tolerate any bracket or quote errors. You are a top master of JSON data formatting.',
    verbose=True,
    tools=[geocode_address] # Give him the tool we just wrote!
)

# 2. Create the second task: Conversion and Formatting
format_to_json_task = Task(
    description=(
        'Please receive the list of 5 campgrounds organized by the previous task. '
        'For each campground, use your [Address to Coordinate Conversion Tool] to query the precise lat and lng. '
        'If the tool returns "not found," fill in 0 for latitude and longitude. '
        'Finally, convert all information of these 5 campgrounds into a JSON Array format.'
    ),
    expected_output=(
        'Must be a completely valid JSON string, without any Markdown markers (like ```json). Example format:\n'
        '[\n'
        '  {\n'
        '    "name": "Campground Name",\n'
        '    "address": "Address",\n'
        '    "price": "Price Range",\n'
        '    "feature": "Feature",\n'
        '    "lat": 24.123,\n'
        '    "lng": 121.456\n'
        '  }\n'
        ']'
    ),
    agent=geocode_expert
)

# 3. Establish the project team (Crew)
camping_crew = Crew(
    agents=[search_expert, geocode_expert],
    tasks=[hsinchu_camping_task, format_to_json_task],
    process=Process.sequential, # Sequential: A finishes then B does
    verbose=True
)

# 4. 🔥 The boss presses the start button!
print("🚀 Starting fully automated campground collection plan...")
result = camping_crew.kickoff()

print("==================================")
print("🎉 Final perfect JSON output:")
print(result)

# If your expected_output is well-written, you can even directly use json.loads() to convert the result into a Python dictionary, ready to write to the database!
try:
    final_data = json.loads(result)
    print(f"Successfully parsed {len(final_data)} campground records! Ready to write to Supabase!")
except Exception as e:
    print("AI output format error, cannot parse as JSON:", e)

🔍 Deep Analysis: This is the million-dollar automated crawler architecture! In the past, you had to write a bunch of try/catch blocks to crawl HTML, parse strings, call APIs, and finally manually convert to JSON. Now, you just write a few "human words," set the responsibilities of two AI employees. After pressing kickoff(), you will see an amazing scene in the terminal: The AI will think on its own: "I now have 5 campgrounds. The first one is Sakura Tribe. I need to use the address tool..." Then it will call the API itself, get the coordinates, and then think about the next one. If the API errors, it might even self-reflect: "Oops, the API reported an error. Let me try shortening the address and query again."

This is what is called an Agentic Workflow. It has self-correction and logical reasoning capabilities!


🚫 Ultimate Pitfall Avoidance Guide: AI Refuses to Output Clean JSON

Even if you repeatedly instruct, current LLMs (especially cheaper models like GPT-3.5 or Claude 3 Haiku) have an "obsession": they love to add a sentence like "Okay boss, here is the JSON I organized for you:" before the JSON you requested, and append a ````json` Markdown marker at the end.

If your code directly uses json.loads(result) to receive this string, the program will immediately crash (JSONDecodeError)!

💡 Vibe Prompt Practice 5: Write an Invincible JSON Cleaning Tool

In Vibe Coding, instead of wasting time arguing with the AI to stop adding Markdown, it's better to

Unlock Full Tutorial

This chapter is paid content. Join the project to unlock over 5000 words of deep analysis, including 10+ god-tier Prompts and real Source Code examples!