No-follow and no-index directives

You can improve search quality by specifying directives for the Web crawler that control whether links on pages are followed and whether pages are indexed.

Some Web pages have no-follow or no-index directives, which instruct robots (such as the Web crawler) to not follow links found in those pages, to not include the contents of those pages in the index, or to not do either of these actions.

Controlling these settings can improve the quality of the crawl. For example, some directory pages can contain thousands of links but no other useful content; those pages should be crawled, and their links followed, but there is no benefit to indexing the directory pages themselves.

There might also be times when you want the crawler to go no lower in a hierarchy, but the desired leaf pages contain links and do not contain no-follow directives. Because some of these pages are automatically generated, they have no owners who might insert the required directives.

To specify rules for crawling such pages, you create or edit a configuration file named followindex.rules. Use the following guidelines when you specify rules in this file:

The rules that you configure must specify URL prefixes (you cannot identify Web sites by IP address or DNS host name).
The URL prefixes can include asterisks (*) as a wildcard character to allow or forbid multiple sites with similar URLs.
Order is significant (the crawler applies the first rule that matches a candidate URL).
The rules, which explicitly allow and forbid following or indexing, override other settings, including those in the target document.