Testing URL connections with the Web crawler

After you specify URLs for the Web crawler to crawl, you can test the configuration of the crawling rules.

You can click Test when you specify the domains, HTTP prefixes, or IP addresses to be crawled, or you can select the Test URLs page to test the crawler's ability to connect to the start URLs in addition to URLs that you specify.

The test results show whether the crawler is able to access URLs with the user agent name that is specified in the crawler properties. The test results also show whether a URL cannot be crawled because of exclusion rules (for example, a document might not be crawled because it has a file extension that matches an extension that is excluded from the crawl space).

After a site is crawled at least once, you can test URLs to obtain additional information. For example, the test report can provide the most recent HTTP status code (which indicates whether a crawl of the URL was successful), show when the URL was last crawled and when it is scheduled to be crawled again, and show whether the user agent is using the Web server's current robots.txt file.