CIRR.COM Herald

CIRR.COM

Web Hosting Support

robots.txt


What is robots.txt, and why is one needed

Answering the second question first; robots.txt allows you some control over what portions of your web site a web robot (spider) can audit/index. By using a robots.txt file, you can, for example, keep web robot from trying to index your entire dynamically generated online store.

Overview

robots.txt is a plain text file in the top directory of your document space (/docroot on CIRR servers). It should contain a list of directories that are allowed to be searched, and a list of browsers that can search them.

An example looks like this:

User-Agent: *
Disallow: /private/
Disallow: /images/
Disallow: /Old/
Disallow: /cgi-bin/

Refusing Access to One Bad Robot

To refuse access by one poorly behaved robot, you could use the following text in the robots.txt file:
User-agent: poorlybehavedbot-1.0
Disallow: /

This disallows all access by the robot which identifies itself as poorlybehavedbot-1.0 in your web server's access logs. (This information appears in the "user agent" field. If you do not have that information your access logs, or in a separate user agent log, we recommend that you complain to your web space provider. It is useful information!)

The filename / is understood to mean that robots should not access any of the files on the server.

Refusing Access by all Robots

You can refuse access to all robots, rather than to individual robots, by the special "user agent" string *. For example:
User-agent: *
Disallow: /cgi-bin

In this example, access to the /cgi-bin directory is refused to all "robots" that adhere to the standard.

More details

A deeper tutorial on robots.txt can be found at Search Engine World. Another good resource appears to be on SearchTools.com. The Standard for Robot Exclusion can be found at robotstxt.org.

If you have any questions about our site, please send us mail.
Copyright 2000,2001 CIRR.COM Contact Us Referral
Program
Support
$Id: robots.txt.html,v 1.1 2001/08/10 16:56:12 cirr Stable $ Terms of Service Privacy Information