Cookie administration

Typically, cookie administration occurs automatically, with no action required from an administrator. If necessary, you can manually specify cookies for a Web crawling session.

Cookies are opaque tokens that a Web server returns to a user agent as part of an HTTP response header. They are meaningful only to the Web server that issued them, and they are used to maintain state between HTTP requests. For example, during client authentication, the Web server might return a cookie that enables the server to determine that an authenticated user is already logged in. The presence of the cookie enables the user to issue additional requests for pages on that Web server without being prompted to log in again.

The Web crawler retains cookies that are received from Web servers and uses them for the duration of the crawler instance. It stores the cookies in a cookies.ini file, which is rewritten by the crawler at the end of every crawler session. When the Web crawler stops, it saves all unexpired cookies, then reloads them at the start of the next session.

If you manually specify cookies, store them in a separate file, and then merge them with the cookies in the cookies.ini file when needed. The crawler does not discard unexpired cookies, but if a problem prevents the writing of the entire cookie collection, you do not want to lose the cookies that you manually specified. You must merge your cookies with the cookies that the crawler automatically maintains before the start of a crawling session.