Block Bad Bots Globally In Apache
There are plenty of guides around to block bots using mod_rewrite however on a shared system with many domains, if you want to block a spoofed bot which is changing IPs you can block it globally from the user-agent in the apache httpd.conf.
In this example, MajesticBot+ had their bot spoofed for the past several months but the user-agent is different than the current version they themselves use (basically it’s an older version being spoofed).
mod_rewrite .htaccess code:
RewriteCond %{HTTP_USER_AGENT} ^MJ12bot/v1\.0\.8.*$
RewriteRule .* - [F]
httpd.conf code:
SetEnvIfNoCase User-Agent “^MJ12bot(.*)1\.0\.8″ stay_out
<Location />
Order Allow,Deny
Deny from env=stay_out
Allow from all
</Location>
Note it only blocks 1.0.8 version of this bot which has been spoofed:
89.130.142.68 - - [15/Jan/2008:10:55:21 +0000] “GET /somepage HTTP/1.1″ 200 48167 “-” “MJ12bot/v1.0.8 (http://majestic12.co.uk/bot.php?+)”
All others will be allowed to crawl your site.
Exact httpd.conf code obtained from this blog entry
Tags: block bad bots, httpd.conf, mod_rewrite
