Support for JavaScript

The Web crawler can find some links (URLs) that are contained in the JavaScript portions of Web documents. If you determine that a high number of URLs embedded in text are of low relevance, you can disable text link parsing by configuring advanced Web crawler properties.

The Web crawler can find both relative and absolute links. If an HTML document contains a BASE element, the crawler uses that element to resolve relative links. Otherwise, the crawler uses the document's own URL.

Support for JavaScript is limited to link extraction. The crawler does not parse JavaScript, does not build a DOM (Document Object Model), and does not interpret or execute JavaScript statements. The crawler looks for strings in the document content (including, but not limited to the JavaScript portions) that are likely to be URLs in JavaScript statements. This means two things:

Some URLs will be found that are ignored by the stricter HTML parser. The crawler will reject anything that is not a syntactically valid URL, but some of the valid URLs returned by the scanning step might be of low interest for searching.
Document content that is generated by JavaScript, such as when a human user views a page with a browser and the browser executes some JavaScript, cannot be detected by the Web crawler, and thus will not be indexed.

Because the Web crawler does not parse JavaScript in HTML files, URLs in JavaScript are not crawled. To enable the Web crawler to crawl URLs in JavaScript, you can do either of following actions:

In the administration console, edit the Web crawler and, on the Web Crawl Space page, add the URLs to the list of URLs that the crawler is to use as a starting point for adding URLs to the collection (Start URLs). For the changes to become effective, restart the Web crawler (you do not need to start a full crawl).
Use the anchor tag (<a href="..">) to specify the URLs as hypertext links in the HTML file.