I want to know what is Robots.txt file. and its uses..........
Page 1 of 1
What is Robots.txt
#2
Posted 21 September 2009 - 02:54 AM
A robots.txt file is used to tell search engine spiders (or bots) what content they should access or stay out of. For example you have two directories on your website like so: example.com/public and example.com/private and you only want the search engines to crawl and index the pages and content in the public folder you would use the robots.txt file. Personally I would also block the private file from being accessed by anyone by using some server side security such as the .htaccess file.
Here is the coded example.
# robots.txt for http://www.example.com/
User-agent: *
Disallow: /private/ #
Learn more at The Web Robots Pages and follow me on twitter @montanaflynn
Here is the coded example.
# robots.txt for http://www.example.com/
User-agent: *
Disallow: /private/ #
Learn more at The Web Robots Pages and follow me on twitter @montanaflynn
Quote
WWW Robots (also called wanderers or spiders) are programs that traverse many pages in the World Wide Web by recursively retrieving linked pages. For more information see the robots page.
In 1993 and 1994 there have been occasions where robots have visited WWW servers where they weren't welcome for various reasons. Sometimes these reasons were robot specific, e.g. certain robots swamped servers with rapid-fire requests, or retrieved the same files repeatedly. In other situations robots traversed parts of WWW servers that weren't suitable, e.g. very deep virtual trees, duplicated information, temporary information, or cgi-scripts with side-effects (such as voting).
These incidents indicated the need for established mechanisms for WWW servers to indicate to robots which parts of their server should not be accessed. This standard addresses this need with an operational solution.
In 1993 and 1994 there have been occasions where robots have visited WWW servers where they weren't welcome for various reasons. Sometimes these reasons were robot specific, e.g. certain robots swamped servers with rapid-fire requests, or retrieved the same files repeatedly. In other situations robots traversed parts of WWW servers that weren't suitable, e.g. very deep virtual trees, duplicated information, temporary information, or cgi-scripts with side-effects (such as voting).
These incidents indicated the need for established mechanisms for WWW servers to indicate to robots which parts of their server should not be accessed. This standard addresses this need with an operational solution.
#3
Posted 23 September 2009 - 12:48 AM
Robot.txt is server side file. and this is a notepad file.......this file we put in CGI Bin directory. it helps to allow or disallow to any search engine to visit your site or not.
#6
Posted 27 September 2009 - 12:54 PM
You can get detailed explanation of robot.txt at http://www.robotstxt...robotstxt.html. I had the same question few weeks ago and was looking for answers. I found this website very useful.
#8
Posted 16 October 2009 - 05:30 AM
"Robots.txt" is a regular text file
Here's a basic "robots.txt":
User-agent: Googlebot
allow: /
User-agent: Googlebot-Image
Disallow: /
Here's a basic "robots.txt":
User-agent: Googlebot
allow: /
User-agent: Googlebot-Image
Disallow: /
#9
Posted 23 October 2009 - 02:53 AM
A robots.txt file is a text file. and it is helpful for your site.
you can handle that which search engine is come in your site and other is not come.
User-agent: search engine crawler name
allow: /
or disallow
you can handle that which search engine is come in your site and other is not come.
User-agent: search engine crawler name
allow: /
or disallow
#10
Posted 08 November 2009 - 04:00 AM
Robot.txt is the text file it give facility to allow and disallow the search engine to visit the site.
Page 1 of 1





Sign In
Register
Help

MultiQuote