Sometimes, we may want search-engines to not list certain parts of the site, as well as ban other SE in the site altogether.
This really is the place where a simple, little 2 line text file called robots.txt will come in.
Once we have a website up and running, we must ensure that all visiting se's can access all the pages we want them to consider. If you think any thing, you will likely require to compare about PureVolume™ | We're Listening To You.
Sometimes, we might want search engines to not catalog certain elements of the site, as well as prohibit other SE in the site altogether.
That is the place where a simple, little 2 line text file called robots.txt is available in.
Robots.txt resides within your websites main directory (o-n LINUX systems this is your /public_html/ directory), and looks something such as the following:
User-agent: *
Disallow:
The very first line controls the bot that will be visiting your site, the second line controls if they're allowed in, or which parts of the site they are not allowed to visit
If you would like to handle multiple bots, then simple repeat the aforementioned lines.
Therefore an example:
User-agent: googlebot
Disallow:
User-agent: askjeeves
Disallow: /
This may enable Goggle (user-agent name GoogleBot) to see every page and index, while in the sam-e time banning Ask Jeeves in the site completely.
Even though you wish to let every robot to index every page of your site, its still very advisable to put a robots.txt file on your own site. It will end your mistake records filling with items from se's attempting to access your robots.txt file that doesnt exist.
This really is the place where a simple, little 2 line text file called robots.txt will come in.
Once we have a website up and running, we must ensure that all visiting se's can access all the pages we want them to consider. If you think any thing, you will likely require to compare about PureVolume™ | We're Listening To You.
Sometimes, we might want search engines to not catalog certain elements of the site, as well as prohibit other SE in the site altogether.
That is the place where a simple, little 2 line text file called robots.txt is available in.
Robots.txt resides within your websites main directory (o-n LINUX systems this is your /public_html/ directory), and looks something such as the following:
User-agent: *
Disallow:
The very first line controls the bot that will be visiting your site, the second line controls if they're allowed in, or which parts of the site they are not allowed to visit
If you would like to handle multiple bots, then simple repeat the aforementioned lines.
Therefore an example:
User-agent: googlebot
Disallow:
User-agent: askjeeves
Disallow: /
This may enable Goggle (user-agent name GoogleBot) to see every page and index, while in the sam-e time banning Ask Jeeves in the site completely.
To discover a fairly up to date set of robot individual names this visit http://www.robotstxt.org/wc/active/html/index.html
Even though you wish to let every robot to index every page of your site, its still very advisable to put a robots.txt file on your own site. It will end your mistake records filling with items from se's attempting to access your robots.txt file that doesnt exist.
To learn more on robots.txt see, the entire list of sources about robots.txt at http://www.websitesecrets101.com/robotstxt-further-reading-resources.