Customers may notice that incremental crawls take significantly longer than expected. This may become counterproductive as some incremental crawl activities take longer than a normally full base crawl .
Additional Indexes JCR table ICMSTJCRLINKS may be necessary to improve WCM incremental crawl performance
Incremental crawl may invoke a lot of operations (reads) on various table joins. It has been noted the reads on joins between JCR tables ICMSTJCRLINKS and ICMSTJCRLINKREL are particularly intensive and expensive. Additional index on ICMSTJCRLINKS may be necessary to mitigate the performance costs associated with unindexed ICMSTJCRLINKS table.
Websphere Portal 6.1.*.*/Websphere Portal 7.0.* connected to the following databases
MICROSOFT SQL SERVER
Diagnosing the problem
A review of the incremental crawl with JCR traces and Search Seedlist traces will show various queries take significantly longer times on the order of minutes where they are expected to complete in seconds
Resolving the problem
Depending on your environment and business needs, additional indexes may be necessary.
Below are 2 that have been known to have significantly improved the incremental crawl time
The first below is particularly relevant
CREATE INDEX ICMSTJCRLINKSIX8 ON <schema>.ICMSTJCRLINKS (LID, WSID, TIID, TVID, TCTID)
This second one may be useful in the event of any WCM deletion since the prior crawl or in cases where deletions are routine in the environment
CREATE INDEX ICMSTRI001001IX1 ON <schema>.ICMSTRI001001(TARGETITEMID, SOURCEITEMID, SOURCEVERSIONID, RRIID)
<schema> refers to the JCR domain
Note that these indexes are already in Portal V8
Depending on your environment more indexes may be needed for your environment
This is affected by the
1 The number of WCM content.
2. The frequency of changes to WCM particularly SiteAreas (new, deletes, modify). Siteareas title/names are often static and would not change much in most environments.
3. The times and number of incremental crawl. Ideally they should be infrequent and done at periods of minimal activity on the server