Good Bot List
Reference list of legitimate bots recognized by WebDecoy.
Overview
Section titled “Overview”WebDecoy maintains a list of 60+ known legitimate bots. When enabled, these bots are allowed through with minimal detection impact.
Search Engine Bots
Section titled “Search Engine Bots”Primary search engine crawlers that index your content.
| Bot | User Agent Pattern | Purpose |
|---|---|---|
| Googlebot | Googlebot | Google Search indexing |
| Googlebot Images | Googlebot-Image | Google Images indexing |
| Googlebot Video | Googlebot-Video | Google Video indexing |
| Googlebot News | Googlebot-News | Google News indexing |
| Google AdsBot | AdsBot-Google | Google Ads landing page check |
| Bingbot | bingbot | Bing Search indexing |
| Yahoo! Slurp | Slurp | Yahoo Search indexing |
| DuckDuckBot | DuckDuckBot | DuckDuckGo indexing |
| Baiduspider | Baiduspider | Baidu (China) indexing |
| YandexBot | YandexBot | Yandex (Russia) indexing |
| Sogou | Sogou | Sogou (China) indexing |
| Exabot | Exabot | Exalead indexing |
| Qwantify | Qwantify | Qwant indexing |
Verification
Section titled “Verification”Google and Bing bots can be verified by reverse DNS:
- Googlebot:
*.googlebot.comor*.google.com - Bingbot:
*.search.msn.com
Social Media Bots
Section titled “Social Media Bots”Crawlers that generate link previews and share content.
| Bot | User Agent Pattern | Purpose |
|---|---|---|
facebookexternalhit | Link preview generation | |
| Facebook Catalog | Facebot | Product catalog scraping |
Twitterbot | Tweet card generation | |
LinkedInBot | Post preview generation | |
Pinterest | Pin image fetching | |
WhatsApp | Link preview generation | |
| Slack | Slackbot | Link unfurling |
| Telegram | TelegramBot | Link preview |
| Discord | Discordbot | Embed generation |
| Skype | SkypeUriPreview | Link preview |
AI/LLM Crawlers
Section titled “AI/LLM Crawlers”Crawlers for AI training and LLM services.
| Bot | User Agent Pattern | Purpose | Default |
|---|---|---|---|
| GPTBot | GPTBot | OpenAI training | Blocked |
| ChatGPT-User | ChatGPT-User | ChatGPT browsing | Blocked |
| ClaudeBot | ClaudeBot, Claude-Web | Anthropic training | Blocked |
| Google-Extended | Google-Extended | Google AI training | Blocked |
| PerplexityBot | PerplexityBot | Perplexity AI | Blocked |
| CCBot | CCBot | Common Crawl | Blocked |
| Cohere | cohere-ai | Cohere training | Blocked |
| Applebot Extended | Applebot-Extended | Apple AI features | Blocked |
Note: AI crawlers are blocked by default. Enable them in Settings if you want your content included in AI training.
Monitoring & Uptime Bots
Section titled “Monitoring & Uptime Bots”Services that check your site availability.
| Bot | User Agent Pattern | Purpose |
|---|---|---|
| Pingdom | Pingdom | Uptime monitoring |
| UptimeRobot | UptimeRobot | Uptime monitoring |
| StatusCake | StatusCake | Uptime monitoring |
| Site24x7 | Site24x7 | Performance monitoring |
| Datadog | Datadog | APM & monitoring |
| New Relic | NewRelicPinger | Performance monitoring |
| GTmetrix | GTmetrix | Performance testing |
| WebPageTest | WebPageTest | Performance testing |
| Catchpoint | Catchpoint | Synthetic monitoring |
SEO & Marketing Bots
Section titled “SEO & Marketing Bots”Tools used for SEO analysis and marketing.
| Bot | User Agent Pattern | Purpose |
|---|---|---|
| Semrush | SemrushBot | SEO analysis |
| Ahrefs | AhrefsBot | Backlink analysis |
| Moz | MozBot, rogerbot | SEO analysis |
| Majestic | MJ12bot | Backlink analysis |
| Screaming Frog | Screaming Frog | Site crawling |
| Sistrix | SISTRIX | SEO analysis |
| Serpstat | serpstatbot | SEO analysis |
| SpyFu | SpyFu | Competitor analysis |
Feed & Content Aggregators
Section titled “Feed & Content Aggregators”Services that fetch RSS/Atom feeds and aggregate content.
| Bot | User Agent Pattern | Purpose |
|---|---|---|
| Feedly | Feedly | RSS aggregation |
| Feedbin | Feedbin | RSS reader |
| Inoreader | Inoreader | RSS reader |
| NewsBlur | NewsBlur | RSS reader |
| Apple News | AppleNewsBot | Apple News aggregation |
Flipboard | Content aggregation |
Developer & Testing Tools
Section titled “Developer & Testing Tools”Tools used for website testing and development.
| Bot | User Agent Pattern | Purpose |
|---|---|---|
| W3C Validator | W3C_Validator | HTML validation |
| W3C Link Checker | W3C-checklink | Link validation |
| Google PageSpeed | Google Page Speed | Performance testing |
| Lighthouse | Chrome-Lighthouse | Web auditing |
| Archive.org | ia_archiver | Web archiving |
Infrastructure Bots
Section titled “Infrastructure Bots”Cloud provider and CDN health checks.
| Bot | User Agent Pattern | Purpose |
|---|---|---|
| AWS ELB | ELB-HealthChecker | Load balancer health |
| Cloudflare | CloudFlare-AlwaysOnline | Always Online feature |
| Fastly | Fastly | CDN health check |
| Akamai | AkamaiGHost | CDN monitoring |
Configuration
Section titled “Configuration”Allowing Good Bots
Section titled “Allowing Good Bots”In Settings → Good Bots (or plugin settings):
| Option | Description |
|---|---|
| Allow Search Engines | Googlebot, Bingbot, etc. |
| Allow Social Media | Facebook, Twitter, LinkedIn, etc. |
| Allow Monitoring | Pingdom, UptimeRobot, etc. |
| Block AI Crawlers | GPTBot, ClaudeBot, etc. |
Custom Allowlist
Section titled “Custom Allowlist”Add your own bots:
MyInternalBotPartnerCrawler/1.0CustomMonitorVerification Methods
Section titled “Verification Methods”For critical applications, verify bot identity:
-
Reverse DNS (recommended for Google/Bing)
Terminal window host <ip_address># Should resolve to *.googlebot.com or *.search.msn.com -
IP Range Verification
- Google publishes Googlebot IP ranges
- Bing publishes bingbot IP ranges
Bot Identification in Detections
Section titled “Bot Identification in Detections”When a good bot is detected:
{ "is_good_bot": true, "bot_name": "Googlebot", "bot_category": "search_engine", "bot_verified": true, "threat_score": 5}Bot Categories
Section titled “Bot Categories”| Category | Examples |
|---|---|
search_engine | Googlebot, Bingbot |
social_media | Facebook, Twitter |
ai_crawler | GPTBot, ClaudeBot |
monitoring | Pingdom, UptimeRobot |
seo_tool | Semrush, Ahrefs |
feed_reader | Feedly, Feedbin |
unknown | Unrecognized bot |