Web Hosting Support


What is robots.txt, and why is one needed

Answering the second question first; robots.txt allows you some control over what portions of your web site a web robot (spider) can audit/index. By using a robots.txt file, you can, for example, keep web robot from trying to index your entire dynamically generated online store.


robots.txt is a plain text file in the top directory of your document space (/docroot on CIRR servers). It should contain a list of directories that are allowed to be searched, and a list of browsers that can search them.

An example looks like this:

User-Agent: *
Disallow: /private/
Disallow: /images/
Disallow: /Old/
Disallow: /cgi-bin/

Refusing Access to One Bad Robot

To refuse access by one poorly behaved robot, you could use the following text in the robots.txt file:
User-agent: poorlybehavedbot-1.0
Disallow: /

This disallows all access by the robot which identifies itself as poorlybehavedbot-1.0 in your web server's access logs. (This information appears in the "user agent" field. If you do not have that information your access logs, or in a separate user agent log, we recommend that you complain to your web space provider. It is useful information!)

The filename / is understood to mean that robots should not access any of the files on the server.

Refusing Access by all Robots

You can refuse access to all robots, rather than to individual robots, by the special "user agent" string *. For example:
User-agent: *
Disallow: /cgi-bin

In this example, access to the /cgi-bin directory is refused to all "robots" that adhere to the standard.

More details

A description of robots.txt can be found at The Standard for Robot Exclusion can be found at

If you have any questions about our site, please send us mail.
Copyright 2000,2001 CIRR.COM Contact Us Referral
$Id: robots.txt.html,v 1.2 2017/10/04 00:56:50 cirr Exp $ Terms of Service Privacy Information