When IBM Legacy Content Search Engine in IBM FileNet P8 is used to extract Lotus Notes email content and an email contains large images, some portions of this email may not be indexed correctly
If you are using a version of IBM Content Collector between 2.2 GA and 184.108.40.206 IF005 to process Lotus Notes email in rich text format and one or more of the emails processed contains a large embedded image at the end of the email body, there are two scenarios in which an error might occur:
- If an affected email is archived in plain text format (for example in a business process management scenario), the body text of the email may not be extracted correctly. In such event, the body text of the email will be missing from the archived email.
- If an affected email is processed using the standard archiving task routes for compliance archiving or mailbox management within IBM Legacy Content Search Engine in IBM FileNet P8, the email will be archived as expected, but may not be indexed correctly.
If a version of IBM Content Collector between 2.2 GA to 220.127.116.11 IF005 ("Content Collector") is used to capture Lotus Notes rich text format emails and the emails contain one or more large embedded images at the end of the email body, Content Collector may fail to extract the body text preceding the embedded image(s). In this case, Content Collector will treat the body text field as empty.
This can result in the following problems:
- In plain text archiving (which is used, for example, in business process management (BPM) scenarios) the email body text might not be included in the .eml archiving output and therefore might not be archived in the IBM Content Manager or IBM FileNet P8 repository.
- When using standard archiving task routes with IBM FileNet P8 and indexing the email content with IBM Legacy Content Search Engine, the email body content might not be indexed correctly. In this case, the email body content will not be searchable.
The issue has been addressed in IBM Content Collector version 3.0, IBM Content Collector version 18.104.22.168 IF006 and IBM Content Collector version 22.214.171.124.
For email that was archived before installation of fix IF006 or 126.96.36.199 and that has been affected by this problem, use the following procedures to re-archive and re-index the email:
- In plain text archiving scenarios, affected email is usually still present in the mailbox, because business process management scenarios are not used for mailbox management. Reprocess all documents in the mailbox by configuring the collector filter to also process items that have been processed before. To do so, update to a version that includes the fix. Then clear the "Ignore items previously processed" check box. Once this is done, the previously processed documents will be processed again.
If you want to process only email that is affected by the problem, use the AfuLNRemoveBpmFlag tool. This tool runs on any server with a Lotus Domino runtime and checks each mailbox for affected email. Email is considered affected if it contains multiple body items where the last body item contains only the embedded image. The tool removes the IBMAfuBpmCollectors property from the affected email, so that the email is collected again and reprocessed. Contact IBM Software Support to obtain the AfuLNRemoveBpmFlag tool.
- For compliance related archiving activities or for mailbox management scenarios, check if the content that was not indexed by IBM Legacy Content Search Engine is significant for your usage scenario. The problem is less likely to impact your search results if your compliance related searches are typically based on only the following criteria or combinations of these criteria:
- Date ranges
- User names or user email addresses
- Email subjects
Refer to the IBM Software Handbook for more information on how to contact IBM Software Support: http://www-304.ibm.com/support/customercare/sas/f/handbook/contacts.html