Bot Database

Major Search Services

Although it goes without saying that most sites will want to ensure the major search engines are crawling your website effectively, fake bots impersonating the major search engines are very common. Why? Cybercriminals know that you won’t want to block the major search engines, and won’t bother to verify the bot to ensure it’s legitimate. Worse still, you whitelist a malicious impersonator bot. Search crawlers, will often crawl your entire site. Bing in particular has a very heavy footprint as it crawls. Seeing widespread crawling is ‘normal’ behaviour from search engines. However, once whitelisted, the malicious bot can hide in this otherwise legitimate bot traffic and simply clone your entire site, and set-up a sophisticated bait and switch credential attack on your customer base. It’s free to crawl and steal content, IP, images, prices, or other commercially sensitive data. Verifying search bots is also far from straightforward. Bing helpfully publishes a list of IPs that can be used to authenticate it’s origins. This list needs to be constantly updated as ranges change over time. Just recently, we also tracked IPs that are not in the list, that are nevertheless legitimate. Very frustrating that Bing can’t sort out the basics of it’s validation. VerifiedVisitors authenticates all the major search engine bots, and ensures that the bot is valid. We look at the digital provenance, and also at the actual bot behaviour to ensure that only the verified search engine bots you want are crawling your site. Each of the search engines does have specific guides on how to verify the user agent and bot origination. Although to date, its only Bing that failed its own verification, the verification data frequently changes, and its only too easy to whitelist what looks like a legitimate bot with auto-mated tools checking and verifying each crawler for you. Using VerifiedVisitors also gives you detailed information on each search engine, and the crawling activity. You can use the search engine panel to see the last crawled dates, requests made and crawl volume, which can be helpful to see how often your site is indexed.

Vendor

Bot Service

Recommendation

Description

logotypelogotype

Yahoo

Yahoo! Slurp

Recommended

Not recommended

Slurp is the Yahoo Search robot for crawling and indexing web page information for the Yahoo search, but also is used across Yahoo Mobile Search results. Additionally, Slurp does the following: Collects content from partner sites for inclusion within sites like Yahoo News, Yahoo Finance and Yahoo Sports. Slurp accesses pages from sites across the Web to confirm accuracy and improve Yahoo's personalized content for Yahoo's users. Slurp is designed to make reasonable requests that don't overburden websites, however, Webmasters can use the Yahoo webmaster tools to restrict the pages that Slurp crawls by disallowing crawling of certain sub-directories, or by slowing the rate that Slurp crawls using a crawl-delay.

logotypelogotype

Google

Google StoreBot

Recommended

Not recommended

This is the mobile agent for Google Shopping crawler. If you sell products online you will want to allow this crawler

logotypelogotype

Google

GoogleBot

Recommended

Not recommended

For most sites, Googlebot shouldn't access your site more than once every few seconds on average. However, due to delays it's possible that the rate will appear to be slightly higher over short periods. Googlebot was designed to be run simultaneously across thousands of machines for scaleability. Also, to cut down on bandwidth usage, Google does run many crawlers on machines located near the sites that they might crawl. Therefore, your logs may show visits from several machines at google.com, all with the user-agent Googlebot. VerifiedVisitors uses multi-factor authentication to ensure just the genuine Googlebot agents are crawling your site.

logotypelogotype

Google

Google-InspectionTool

Recommended

Not recommended

This is the desktop version of Google-InspectionTool used by Search testing tools such as the Rich Result Test and URL inspection in Search Console. Apart from the user agent and user agent token, it mimics Googlebot.The URL Inspection tool provides information about Google's indexed version of a specific page, and also allows you to test whether a URL might be indexable. Information includes details about structured data, video, linked AMP, and indexing/indexability.

logotypelogotype

Google

Favicon

Recommended

Not recommended

This bot from Google creates shortcuts icon for various Google services.

logotypelogotype

Bing

MSNbot

Recommended

Not recommended

Bingbot is the standard Bing crawler and handles most crawling needs of the service. Bingbot uses a couple of different user agent strings, this one is an older crawler which VerifiedWhiteList has seen active and visiting sites.

logotypelogotype

Bing

Bingbot

Recommended

Not recommended

Bingbot is their standard crawler and handles most of their crawling needs each day. Bingbot uses a couple of different user agent strings which include several mobile variants with which to crawl the mobile web.

logotypelogotype

Bing

Bing Preview

Recommended

Not recommended

BingPreview bot is used to generate page snapshots. Note that BingPreview has "desktop" and "mobile" variants.

logotypelogotype

Apple

Applebot

Recommended

Not recommended

Apple Bot is the web crawler used by Apple products like Siri and Spotlight Suggestions. Respects robots.txt