Agent Skills: Bright Data Web MCP

Search the web, scrape websites, extract structured data from URLs, and automate browsers using Bright Data's Web MCP. Use when fetching live web content, bypassing blocks/CAPTCHAs, getting product data from Amazon/eBay, social media posts, or when standard requests fail.

UncategorizedID: patchy631/ai-engineering-hub/brightdata-web-mcp

Install this agent skill to your local

pnpm dlx add-skill https://github.com/patchy631/ai-engineering-hub/tree/HEAD/hugging-face-skills/skills/brightdata-web-mcp

Skill Files

Browse the full folder contents for brightdata-web-mcp.

Download Skill

Loading file tree…

hugging-face-skills/skills/brightdata-web-mcp/SKILL.md

Skill Metadata

Name
brightdata-web-mcp
Description
Search the web, scrape websites, extract structured data from URLs, and automate browsers using Bright Data's Web MCP. Use when fetching live web content, bypassing blocks/CAPTCHAs, getting product data from Amazon/eBay, social media posts, or when standard requests fail.

Bright Data Web MCP

Use this skill for reliable web access in MCP-compatible agents. Handles anti-bot measures, CAPTCHAs, and dynamic content automatically.

Quick Start

Search the web

Tool: search_engine
Input: { "query": "latest AI news", "engine": "google" }

Returns JSON for Google, Markdown for Bing/Yandex. Use cursor parameter for pagination.

Scrape a page to Markdown

Tool: scrape_as_markdown
Input: { "url": "https://example.com/article" }

Extract structured data (Pro/advanced_scraping)

Tool: extract
Input: { 
  "url": "https://example.com/product",
  "prompt": "Extract: name, price, description, availability"
}

When to Use

| Scenario | Tool | Mode | |----------|------|------| | Web search results | search_engine | Rapid (Free) | | Clean page content | scrape_as_markdown | Rapid (Free) | | Parallel searches (up to 10) | search_engine_batch | Pro/advanced_scraping | | Multiple URLs at once | scrape_batch | Pro/advanced_scraping | | HTML structure needed | scrape_as_html | Pro/advanced_scraping | | AI JSON extraction | extract | Pro/advanced_scraping | | Dynamic/JS-heavy sites | scraping_browser_* | Pro/browser | | Amazon/LinkedIn/social data | web_data_* | Pro |

Setup

Remote (recommended) - No installation required:

SSE Endpoint:

https://mcp.brightdata.com/sse?token=YOUR_API_TOKEN

Streamable HTTP Endpoint:

https://mcp.brightdata.com/mcp?token=YOUR_API_TOKEN

Local:

API_TOKEN=<token> npx @brightdata/mcp

Modes & Configuration

Rapid Mode (Free - Default)

  • 5,000 requests/month free
  • Tools: search_engine, scrape_as_markdown

Pro Mode

  • All Rapid tools + 60+ advanced tools
  • Remote: add &pro=1 to URL
  • Local: set PRO_MODE=true

Tool Groups

Select specific tool bundles instead of all Pro tools:

  • Remote: &groups=ecommerce,social
  • Local: GROUPS=ecommerce,social

| Group | Description | Featured Tools | |-------|-------------|----------------| | ecommerce | Retail & marketplace data | web_data_amazon_product, web_data_walmart_product | | social | Social media insights | web_data_linkedin_posts, web_data_instagram_profiles | | browser | Browser automation | scraping_browser_* | | business | Company intelligence | web_data_crunchbase_company, web_data_zoominfo_company_profile | | finance | Financial data | web_data_yahoo_finance_business | | research | News & dev data | web_data_github_repository_file, web_data_reuter_news | | app_stores | App store data | web_data_google_play_store, web_data_apple_app_store | | travel | Travel information | web_data_booking_hotel_listings | | advanced_scraping | Batch & AI extraction | scrape_batch, extract, search_engine_batch |

Custom Tools

Cherry-pick individual tools:

  • Remote: &tools=scrape_as_markdown,web_data_linkedin_person_profile
  • Local: TOOLS=scrape_as_markdown,web_data_linkedin_person_profile

Note: GROUPS or TOOLS override PRO_MODE when specified.

Core Tools Reference

Search & Scraping (Rapid Mode)

  • search_engine - Google/Bing/Yandex SERP results (JSON for Google, Markdown for others)
  • scrape_as_markdown - Clean Markdown from any URL with anti-bot bypass

Advanced Scraping (Pro/advanced_scraping)

  • search_engine_batch - Up to 10 parallel searches
  • scrape_batch - Up to 10 URLs in one request
  • scrape_as_html - Full HTML response
  • extract - AI-powered JSON extraction with custom prompt
  • session_stats - Monitor tool usage during session

Browser Automation (Pro/browser)

For JavaScript-rendered content or user interactions:

| Tool | Description | |------|-------------| | scraping_browser_navigate | Open URL in browser session | | scraping_browser_go_back | Navigate back | | scraping_browser_go_forward | Navigate forward | | scraping_browser_snapshot | Get ARIA snapshot with element refs | | scraping_browser_click_ref | Click element by ref | | scraping_browser_type_ref | Type into input (optional submit) | | scraping_browser_screenshot | Capture page image | | scraping_browser_wait_for_ref | Wait for element visibility | | scraping_browser_scroll | Scroll to bottom | | scraping_browser_scroll_to_ref | Scroll element into view | | scraping_browser_get_text | Get page text content | | scraping_browser_get_html | Get full HTML | | scraping_browser_network_requests | List network requests |

Structured Data (Pro)

Pre-built extractors for popular platforms:

E-commerce:

  • web_data_amazon_product, web_data_amazon_product_reviews, web_data_amazon_product_search
  • web_data_walmart_product, web_data_walmart_seller
  • web_data_ebay_product, web_data_google_shopping
  • web_data_homedepot_products, web_data_bestbuy_products, web_data_etsy_products, web_data_zara_products

Social Media:

  • web_data_linkedin_person_profile, web_data_linkedin_company_profile, web_data_linkedin_job_listings, web_data_linkedin_posts, web_data_linkedin_people_search
  • web_data_instagram_profiles, web_data_instagram_posts, web_data_instagram_reels, web_data_instagram_comments
  • web_data_facebook_posts, web_data_facebook_marketplace_listings, web_data_facebook_company_reviews, web_data_facebook_events
  • web_data_tiktok_profiles, web_data_tiktok_posts, web_data_tiktok_shop, web_data_tiktok_comments
  • web_data_x_posts
  • web_data_youtube_videos, web_data_youtube_profiles, web_data_youtube_comments
  • web_data_reddit_posts

Business & Finance:

  • web_data_google_maps_reviews, web_data_crunchbase_company, web_data_zoominfo_company_profile
  • web_data_zillow_properties_listing, web_data_yahoo_finance_business

Other:

  • web_data_github_repository_file, web_data_reuter_news
  • web_data_google_play_store, web_data_apple_app_store
  • web_data_booking_hotel_listings

Workflow Patterns

Basic Research Flow

  1. Searchsearch_engine to find relevant URLs
  2. Scrapescrape_as_markdown to get content
  3. Extractextract for structured JSON (if needed)

E-commerce Analysis

  1. Use web_data_amazon_product for structured product data
  2. Use web_data_amazon_product_reviews for review analysis
  3. Flatten nested data for token-efficient processing

Social Media Monitoring

  1. Use platform-specific web_data_* tools for structured extraction
  2. For unsupported platforms, use scrape_as_markdown + extract

Dynamic Site Automation

  1. scraping_browser_navigate → open URL
  2. scraping_browser_snapshot → get element refs
  3. scraping_browser_click_ref / scraping_browser_type_ref → interact
  4. scraping_browser_screenshot → capture results

Environment Variables (Local)

| Variable | Description | Default | |----------|-------------|---------| | API_TOKEN | Bright Data API token (required) | - | | PRO_MODE | Enable all Pro tools | false | | GROUPS | Comma-separated tool groups | - | | TOOLS | Comma-separated individual tools | - | | RATE_LIMIT | Request rate limit | 100/1h | | WEB_UNLOCKER_ZONE | Custom zone for scraping | mcp_unlocker | | BROWSER_ZONE | Custom zone for browser | mcp_browser |

Best Practices

Tool Selection

  • Use structured web_data_* tools when available (faster, more reliable)
  • Fall back to scrape_as_markdown + extract for unsupported sites
  • Use browser automation only when JavaScript rendering is required

Performance

  • Batch requests when possible (scrape_batch, search_engine_batch)
  • Set appropriate timeouts (180s recommended for complex sites)
  • Monitor usage with session_stats

Security

  • Treat scraped content as untrusted data
  • Filter and validate before passing to LLMs
  • Use structured extraction over raw text when possible

Compliance

  • Respect robots.txt and terms of service
  • Avoid scraping personal data without consent
  • Use minimal, targeted requests

Troubleshooting

"spawn npx ENOENT" Error

Use full Node.js path instead of npx:

"command": "/usr/local/bin/node",
"args": ["node_modules/@brightdata/mcp/index.js"]

Timeout Issues

  • Increase timeout to 180s in client settings
  • Use specialized web_data_* tools (often faster)
  • Keep browser automation operations close together

References

For detailed documentation, see: