How To Write Proper Robots.txt Files For Your Website And Control Indexing
The robots.txt is a text file which would be uploaded to the root directory of your website where it contains a set of rules for the Search Engine spiders. Robots.txt is mainly used to tell the web spiders to don't crawl the following (given) links. It is very important to create robots.txt file in the correct order for your website. If any of the rules blocks spiders from accessing your website, your website will be out of the search engine index. So always be careful while creating robots.txt for your website.
The default form of robots.txt file will allow web spiders to access your entire website except the portion where you need to log in using a user name and password. Spiders cannot access any session which is protected with user name and password. The default form of robots.txt file is written below.
User-agent: *
Disallow:
If you are creating robots.txt file from the google webmaster tools you will get a different code for allowing web spiders overall your website. The code generated by google webmaster tools for the default allow is:
User-agent: *
allow: /
By carefully analyzing we can understand that both are doing the same function as allowing the spiders. There is a conflict in adding robots.txt file on root directory of websites between seo consultants and webmasters. In most of the well designed websites they don't need to block google or any search engine spiders from accessing certain pages or files. So webmasters are thinking like adding robots.txt file on their websites is a wastage of time. But the seo consultants on the other hand forcing them to add robots file on the website as it is an important part in promoting a website in major Search Engines.
The webmasters are ready to add robots.txt on your website if they need to block search engine spiders from accessing certain files on the website. The answer from seo consultants in this issue is "We need to avoid any chance of 404 error pages on our website". Search Engine spiders may look for the robots.txt file on your website and if the file is not present the spiders will get in to 404 error page. So there is a chance that spiders may report it as broken link and it may affect the visibility of your website. This is the reason seo consultants are giving for adding the simple allow only robots.txt file on your website.
If you need to block the entire site from web spiders we need to write our robots.txt file as follows.
User-agent: *
Disallow: /
This format will block web spiders from accessing the entire website. The result of this file is your website will not be indexed by search engines and your website will not comes in search engine results. To block spiders from accessing certain files from your website create a robots.txt file like below.
The major advantage of last format is ,it can let spiders from indexing your website in major search engines but the blocked files will not comes in search engine result.