Web Hosting Support
What is robots.txt, and why is one needed
Answering the second question first; robots.txt allows you some control over what portions of your web site a web robot (spider) can audit/index. By using a robots.txt file, you can, for example, keep web robot from trying to index your entire dynamically generated online store.
Overviewrobots.txt is a plain text file in the top directory of your document space (/docroot on CIRR servers). It should contain a list of directories that are allowed to be searched, and a list of browsers that can search them.
An example looks like this:
Refusing Access to One Bad RobotTo refuse access by one poorly behaved robot, you could use the following text in the
This disallows all access by the robot which identifies itself as
poorlybehavedbot-1.0in your web server's access logs. (This information appears in the "user agent" field. If you do not have that information your access logs, or in a separate user agent log, we recommend that you complain to your web space provider. It is useful information!)
/is understood to mean that robots should not access any of the files on the server.
Refusing Access by all RobotsYou can refuse access to all robots, rather than to individual robots, by the special "user agent" string
*. For example:User-agent: *
In this example, access to the /cgi-bin directory is refused to all "robots" that adhere to the standard.
More detailsA deeper tutorial on robots.txt can be found at Search Engine World. Another good resource appears to be on SearchTools.com. The Standard for Robot Exclusion can be found at robotstxt.org.
|Copyright 2000,2001 CIRR.COM||Contact Us||
|$Id: robots.txt.html,v 1.1 2001/08/10 16:56:12 cirr Stable $||Terms of Service||Privacy Information|