When IBM Legacy Content Search Engine in IBM FileNet P8 is used to extract Lotus Notes email content and an email contains large images, some portions of this email may not be indexed correctly

Flash (Alert)


Abstract

If you are using a version of IBM Content Collector between 2.2 GA and 2.2.0.2 IF005 to process Lotus Notes email in rich text format and one or more of the emails processed contains a large embedded image at the end of the email body, there are two scenarios in which an error might occur:

- If an affected email is archived in plain text format (for example in a business process management scenario), the body text of the email may not be extracted correctly. In such event, the body text of the email will be missing from the archived email.

- If an affected email is processed using the standard archiving task routes for compliance archiving or mailbox management within IBM Legacy Content Search Engine in IBM FileNet P8, the email will be archived as expected, but may not be indexed correctly.

Content

If a version of IBM Content Collector between 2.2 GA to 2.2.0.2 IF005 ("Content Collector") is used to capture Lotus Notes rich text format emails and the emails contain one or more large embedded images at the end of the email body, Content Collector may fail to extract the body text preceding the embedded image(s). In this case, Content Collector will treat the body text field as empty.

This can result in the following problems:

  • In plain text archiving (which is used, for example, in business process management (BPM) scenarios) the email body text might not be included in the .eml archiving output and therefore might not be archived in the IBM Content Manager or IBM FileNet P8 repository.
  • When using standard archiving task routes with IBM FileNet P8 and indexing the email content with IBM Legacy Content Search Engine, the email body content might not be indexed correctly. In this case, the email body content will not be searchable.
When using the standard archiving task routes for compliance (journal archiving) or for mailbox management, the archived email content in CSN format is not affected by this issue, and the email is correctly archived in the repository. When using IBM Content Manager as a repository, the email body is indexed correctly.

The issue has been addressed in IBM Content Collector version 3.0, IBM Content Collector version 2.2.0.2 IF006 and IBM Content Collector version 2.2.0.3.

For email that was archived before installation of fix IF006 or 2.2.0.3 and that has been affected by this problem, use the following procedures to re-archive and re-index the email:
  • In plain text archiving scenarios, affected email is usually still present in the mailbox, because business process management scenarios are not used for mailbox management. Reprocess all documents in the mailbox by configuring the collector filter to also process items that have been processed before. To do so, update to a version that includes the fix. Then clear the "Ignore items previously processed" check box. Once this is done, the previously processed documents will be processed again.

    If you want to process only email that is affected by the problem, use the AfuLNRemoveBpmFlag tool. This tool runs on any server with a Lotus Domino runtime and checks each mailbox for affected email. Email is considered affected if it contains multiple body items where the last body item contains only the embedded image. The tool removes the IBMAfuBpmCollectors property from the affected email, so that the email is collected again and reprocessed. Contact IBM Software Support to obtain the AfuLNRemoveBpmFlag tool.
  • For compliance related archiving activities or for mailbox management scenarios, check if the content that was not indexed by IBM Legacy Content Search Engine is significant for your usage scenario. The problem is less likely to impact your search results if your compliance related searches are typically based on only the following criteria or combinations of these criteria:
    • Date ranges
    • User names or user email addresses
    • Email subjects
    If your keyword searches are based on the email content or if you need assistance assessing the potential impacts on your usage scenario, contact IBM Software Support for assistance.

Refer to the IBM Software Handbook for more information on how to contact IBM Software Support: http://www-304.ibm.com/support/customercare/sas/f/handbook/contacts.html

Rate this page:

(0 users)Average rating

Document information


More support for:

Content Collector
Content Collector for Email

Software version:

2.2

Operating system(s):

Windows 2008 server

Reference #:

1607979

Modified date:

2012-08-13

Translate my page

Machine Translation

Content navigation