The Despair When Requests Meet "Loading..."
In Chapter 3, we taught how to write a simple web scraper using requests + BeautifulSoup.
This worked perfectly for websites from 10 years ago. But if you try scraping modern sites like PChome, Shopee, or real-time stock quote platforms today, you'll encounter a crushing reality:
The HTML you scrape contains no data—just a single line <div id="loading">Loading...</div>.
This happens because modern websites use React or Vue for "client-side rendering (CSR)." Data is secretly fetched from background APIs only after the browser executes JavaScript. Worse, many sites now add Cloudflare's "bot verification" (that checkbox asking you to prove you're human).
At this point, traditional scrapers fail. We need to bring out the big guns: Playwright.
🎭 What is Playwright?
Playwright is Microsoft's open-source ultimate "browser automation testing tool."
Imagine it as a "ghost engineer." When you run a Playwright Python script, it secretly opens a real Chrome browser in the background, simulating human actions—clicking buttons, scrolling, waiting for spinners to disappear—before finally extracting the visible text.
Since it's a real browser, all dynamically rendered data appears perfectly, and it can bypass many basic anti-scraping measures.
🛠️ Vibe Coding in Action: Scraping Dynamic Stock Data with Playwright
Learning Playwright's syntax is notoriously hard, as it involves complex asynchronous (async/await) operations and DOM element targeting (XPath/CSS Selectors).
But with Cursor, we don't need to memorize syntax.
【Playwright Dynamic Scraping Vibe Prompt】
I'm developing a web scraper in Python.
Target URL:https://example-stock-site.com/(a hypothetical stock site where tables load dynamically after 3 seconds).Please use the
playwrightpackage to:
- Use
async/awaitarchitecture.- Launch Chromium (set
headless=Falseso I can see the process).- Navigate to the URL, then wait for the
#stock-tableelement to appear (ensuring dynamic data loads).- Extract all
.stock-priceelements, convert prices to floats, and store them in a Python List.- Save the List as
prices.csv.- Include complete Chinese comments and error handling.
AI-generated god-tier script:
import asyncio
import csv
from playwright.async_api import async_playwright
async def scrape_dynamic_stocks():
# Launch Playwright
async with async_playwright() as p:
# Open browser (headless=False shows the actual window)
browser = await p.chromium.launch(headless=False)
page = await browser.new_page()
try:
print("🌐 Navigating to stock site...")
await page.goto('https://example-stock-site.com/')
# Critical step: Wait for dynamic table to load! (avoid empty data)
print("⏳ Waiting for data to load...")
await page.wait_for_selector('#stock-table', timeout=10000)
# Extract all elements with class 'stock-price'
price_elements = await page.query_selector_all('.stock-price')
prices = []
for element in price_elements:
text = await element.inner_text()
# Remove currency symbols/commas, convert to float
clean_price = float(text.replace('$', '').replace(',', ''))
prices.append([clean_price])
# Save to CSV
with open('prices.csv', 'w', newline='', encoding='utf-8') as f:
writer = csv.writer(f)
writer.writerow(['Price'])
writer.writerows(prices)
print(f"✅ Successfully scraped {len(prices)} stock records!")
except Exception as e:
print(f"🚨 Scraping error: {e}")
finally:
await browser.close()
# Run async function
asyncio.run(scrape_dynamic_stocks())
👁️ AI Vision Targeting: The Ultimate Evolution of Scraping
Historically, scraper developers' biggest pain was website redesigns.
If a site engineer changed class="stock-price" to class="price-text-v2", your scraper would instantly crash.
But with Vibe Coding and AI integration, this pain point is disappearing.
If you connect OpenAI's gpt-4o (with vision capabilities) to Playwright, the workflow becomes:
- Playwright opens the webpage.
- Playwright takes a full-page screenshot.
- Your Python script sends the screenshot to OpenAI with the query:
"What is TSMC's stock price in this image?" - OpenAI analyzes the image and returns
1050.
This is "selector-less scraping."
You completely ignore HTML structure or class name changes. If a human can see the number, AI can extract it. This is cutting-edge black magic in data science—and your ultimate weapon for future freelance projects!