- Civic SEO Home
- Backlinks
- Blogging
- Content Management System
- Domain Names
- General SEO
- Google Pagerank
- Google Webmasters Central
- Keywords
- Link Development
- Live Webmasters Center
- Meta Tags
- Redirecting
- Robots.txt
- SEO Tools and Software
- Social Bookmarking
- Sub-Domains
- Sub-Folders
- URL Modding and Rewriting
- Viral Marketing
- XML Sitemaps
- Yahoo Site Explorer
- Show all categories
- Instant Response
- Sitemap
- Help
- Contact
Robots.txt
What is the robots.txt file used for?
The robots.txt file is used for the The Robots Exclusion Protocol, its primary purpose is to give indexing instructions to web robots and\or search engine crawlers. This file is always checked first before a search engine crawler will visit your site and start indexing it. This file is always placed in the top-level directory of your webserver, respectable web robots and crawlers will always strip off any other URL path information in a link before crawling that page or web site. Below is an example.
Our web robot comes to the following below URL while crawling another site.
- http://www.abc.com/letters/list.html
Before crawling that specific URL it will strip off the back end URL components and check for a robots.txt file, below is where a web robot would look for your robots.txt at the above website.
- http://www.abc.com/robots.txt
It should be noted that a robots.txt file can be ignored by a web robot, especially robots designed to harvest e-mails, content scrappers, etc. You should never use a robots.txt file to "hide" information or content. Remember, anyone can view your robots.txt file. The main purporse of the robots.txt file is to tell web robots what not to crawl, if you want your entire site crawled and indexed you can simply create a robots.txt file in notepad and insert the below code into it.
User-agent: *
Disallow:
sitemap: <full-url-path-to-your-sitemap>
If you do not have a XML sitemap you can remove that line from your robots.txt file. The user agent is set to all web robots, if you wanted to make different rules for different web robots you would create an additional set of lines in your robots.txt and insert the web robots name.
See Also: Where can I find a robots.txt generator?