Technote (FAQ)
Question
How can I use IBM® eDiscovery Manager Version 2.2 Fix Pack 4 with IBM Connections documents that are archived by IBM Content Collector Version 3.0 or later?
Answer
This technote describes how the newly added fixes for multi-part document search and export in IBM eDiscovery Manager, Version 2.2 Fix Pack 4 can be used to search IBM Connections documents that are archived by IBM Content Collector, Version 3.0 or later. It briefly explains the supported scenarios, necessary configuration steps, and content search options.
You can find more information about how IBM Connections documents that are archived by ICC V3.0 are represented in IBM Content Manager, Version 8 or FileNet P8 products in the documentation for ICC. Pay special attention to the ICC Indexing Guide which has detailed information on indexing and field sections.
With IBM Content Collector, you can archive all the content of an IBM Connections application. Indirectly related content will not be archived. An example of indirect content is a profile graphic that is shown in a blog comment of a blog post. The profile graphic is owned by the Profiles application and is archived when you archive the profiles. When you archive a blog post, the profile graphic is not archived because it is indirectly related content. Also, any custom widgets and content in IBM Connections pages are not archived.
Viewing IBM Connections documents
IBM Connections content will be rendered by the eDiscovery Manager viewer similarly to how page looked in the original IBM Connections context. However, because this is done only in the eDiscovery Manager viewer and IBM Connections is not required, there are some differences. The rendering mirrors the styling that you see IBM Connections, V3.0.1 with the following exceptions:
- IBM Connections content that contains links to external locations will be highlighted with an icon in the eDiscovery Manager preview. Clicking a link will open a new window that navigates to the page.
- Graphics in a page might not be displayed if the graphics source was external or is not yet archived.
- Search terms will be highlighted in the IBM Connections document preview.
- Dates are displayed as UTC with a UTC suffix.
- If information cannot be retrieved for the archive, you will see "Unavailable." For example, the Background tab information of a Profile is not archived, you will see "Background (Unavailable)" displayed on that tab.
Exporting IBM Connections documents
eDiscover Manager supports exporting IBM Connections documents in native XML format or HTML format. The native XML format that is stored is based on open standards such as the ATOM feed protocol with specific IBM Connections extensions.
The HTML export is similar to the HTML that is generated for viewing except that no highlighting will be performed. PDF and TIFF export formats are not supported.
Searching IBM Connections documents
FileNet P8 specific search configuration
IBM Content Collector will create a default document class with the following name and properties:
Document class symbolic name:
- ICCConnectionsInstance
Document properties:
- DocumentTitle
- ICCCreatedDate
- ICCExpirationDate
- ICCLastModifiedDate
- ICCModifiedBy
- ICCApplicationName
- ICCCreatedBy
From eDiscovery Manager, to view and search IBM Connections content that is archived by IBM Content Collector and is stored in a Content Engine object store, create a new eDiscovery Manager custom collection type and use the following field mapping:
| Collection field | Content server property | Type | Text index | Description |
| CONTENT | String | //icc_document | Matches all of the content, including attachments of an IBM Connections application document. | |
| DOCUMENT | String | //icc_part | Matches all of the content of an IBM Connections document except attachment plain text extract. | |
| ATTACHMENT | String | //icc_attachment | Matches the plain text extract of a attachment. | |
| ATTACHMENT_NAME | String | //icc_attachment/@name | Matches the display name of a attachment, as shown in IBM Connections. | |
| TITLE | DocumentTitle | String | //title | Matches the title or name of an IBM Connections application content. For example, wiki page title, blog post title, profile title. |
| SUB_TITLE | String | //subtitle | Matches, if available, the activity goal of an IBM Connections activity. | |
| ENTRY_ID | String | //entry/id | Matches the GUID set by IBM Connections to identify a specific portion of content, for example, a wiki page or blog post. | |
| AUTHOR | String | //author/name | Matches the names of persons who created an IBM Connections application content or subcontent. | |
| CONTRIBUTOR | String | //contributor/name | Matches the names of persons who change an IBM Connections application content or subcontent. | |
| MODIFIER | String | //modifier/name | Matches names of persons who changed an IBM Connections wiki page. This is used instead of contributor by IBM Connections wiki page application. | |
| CUSTOM_FIELD | String | //field | Matches values of custom fields for the IBM Connections applications that support custom fields (activities). | |
| PUBLISHED | Date | //published/@icc_date | Matches the published date of any content or subcontent, for example, published date of a blog post or a specific comment to a blog post. Remember: - An IBM Connections profile does not have a published date and thus it is not searchable by an index; however, the ICCCreatedDate value will be set to the updated date of the profile. - An IBM Connections activity that does not contain any subactivity does not have a published date and thus it is not searchable by an index; however, the ICCCreatedDate value will be set to the updated date of the activity. |
|
| UPDATED | ICCLastModifiedDate | Date | //updated/@icc_date | Matches the updated date of any content or subcontent (see PUBLISHED). This field is available for all IBM Connections content. |
| TAG | String | //category/@term | Matches the tags given to any content or subcontent of IBM Connections content. | |
| APPLICATION_NAME | ICCApplicationName | String | //icc_application_name | Matches the IBM Connections application name. The valid values are:
|
| RAW_CONTENT | String | $FULL_TEXT$ | Matches the complete content indexed for an IBM Connections document. Use this only for debugging purposes. | |
| AUTHOR_PRIMARY_ATTRIBUTE | ICCCreatedBy | String | Names of persons who created an IBM Connections application content, for example, a blog post author, but not comment author. This is one of the several authors available under AUTHOR. This field is useful as a result list column. This field is necessary for IBM Connections Files because metadata is not part of content.
|
|
| PUBLISHED_PRIMARY_ATTRIBUTE | ICCCreatedDate | String | Matches the published date of any content, for example, the published date of a blog post). This is one of the several dates available under PUBLISHED. This field is useful as a result list column. This field is necessary for IBM Connections Files because metadata is not part of content. |
IBM Connections Files content differs from other IBM Connections applications content because for Files, only the actual file content is text-indexed and can be text-searched. Therefore, most of the collection fields listed previously are not applicable to IBM Connection Files. Because of this difference, it is recommended that you create two eDiscovery Manager search templates based on the custom collection definition above: One for IBM Connection Files that adds only search fields for the AUTHOR_PRIMARY_ATTRIBUTE, PUBLISHED_PRIMARY_ATTRIBUTE, APPLICATION_NAME and CONTENT fields, and one search template for the other IBM Connections applications content, which has all the other fields and does not use the AUTHOR_PRIMARY_ATTRIBUTE and PUBLISHED_PRIMARY_ATTRIBUTE fields or only uses them as a result list column where needed.
For applications other than Files, the complete XML content of an IBM Connections document is indexed. Therefore, more complex Xpath statements can be built to address specific subelements in the content.
An example for such a complex query would be to search just in the content of comments that are created specifically for IBM Connections content.
To configure a search field for such a use case, which is not covered with the default mapping provided above, perform the following steps:
- Look at a sample IBM Connections document in FileNet P8 by using FileNet Enterprise Manager.
- Identify the relevant content element (MIME type application/icc-comment-atom+xml for comments).
- Look at the content to identify the relevant XML structure (/feed/entry/content for comments)
- Derive the following expression from that build: //icc_part[@mimetype="application/icc-comment-atom+xml"]/feed/entry/content.
- See the table at the end of this document for a list of all MIME types defined by ICC for IBM Connections context.
- Create a new eDiscovery Manager Collection field that uses this index expression.
You can now search explicitly on IBM Connections comments.
XPath syntax supported in field mappings
The subset of XPath that is supported is defined by CSS XML search engines with XPath support. It differs from standard XPath in the following ways:
- It does not support iteration and ranges in path expressions.
- It eliminates filter expressions: that is, it allows filtering only in the predicate expression, not in the path expression.
- It does not allow absolute path names in predicate expressions.
- It implements only one axis (tag) and allows propagation only in the forward direction.
The following expressions are unsupported in the XML search syntax:
- /*
- //*
- /@*
- //@*
Disregarding of XML namespaces
Namespace prefixes are not retained in the indexing of XML tag and attribute names. You can search XML documents by using namespaces, but namespace prefixes are discarded during indexing and removed from XML search queries.
Numeric values
Predicates comparing attribute values to numbers are supported.
Complete match
The operator = (equal sign) with a string argument in a predicate means that a complete match of all tokens in the string with all tokens in the identified text span is required. The order of the tokens is important.
For more details about the XML search syntax, check the FileNet P8 documentation on "SQL Syntax Reference" and go to the "XML Search" section.
IBM Content Manager specific search configuration
IBM Content Collector will create a default item type with the following name and properties:
Item type name:
- ICCConnections
Attributes:
- ICCTitle
- ICCCreatedDate
- ICCExpirationDate
- ICCLastModifiedDate
- ICCModifiedBy
- ICCApplicationName
- ICCCreatedBy
- ICMDeleteHold
- ICCDEIHash
From eDiscovery Manager, to view and search IBM Connections content that is archived by IBM Content Collector and is stored in IBM Content Manager, create a new eDiscovery Manager collection of collection type "Custom Collection" and use the following field mapping:
| Collection field | Content server property | Type | Text index | Description |
| CONTENT | String | document | Matches all of the content, including attachments of an IBM Connections application document. | |
| DOCUMENT | String | content | Matches all of the content of an IBM Connections document except attachment plain text extract. | |
| ATTACHMENT | String | attachment | Matches the plain text extract of an attachment. | |
| ATTACHMENT_NAME | String | attachment_name | Matches the display name of an attachment, as shown in IBM Connections. | |
| TITLE | ICCTitle | String | icc_title | Matches the title or name of an IBM Connections application content, for example, a wiki page title, blog post title, profile title. |
| ENTRY_ID | String | icc_entry_id | Matches the GUID set by IBM Connections to identify a specific portion of content, for example, a wiki page or blog post. | |
| AUTHOR | String | icc_displayName_author_primary | Matches names of persons who created an IBM Connections application content (E.g. blog post author, but not comment author).* | |
| AUTHOR_SUBCONTENT | String | icc_displayName_author | Matches names of persons who created an IBM Connections application subcontent, for example, comment author.* | |
| CONTRIBUTOR | String | icc_displayName_contributor_primary | Matches names of persons who modified an IBM Connections application content, for example, a blog post.* | |
| CONTRIBUTOR_SUBCONTENT | String | icc_displayName_contributor | Matches names of persons who modified an IBM Connections application content, for example, a blog post.* | |
| CUSTOM_FIELD | String | icc_customTextValue | Matches values of custom fields for the IBM Connections applications that support custom fields (activities). | |
| PUBLISHED | Date | icc_published_primary | Matches the published date of any content, for example, the published date of a Blog Post) Remember: - An IBM Connections profile does not have a published date and thus it is not searchable by an index; however, the ICCCreatedDate value will be set to the updated date of the profile. - An IBM Connections activity that does not contain any subactivity does not have a published date and thus it is not searchable by an index; however, the ICCCreatedDate value will be set to the updated date of the activity.* |
|
| PUBLISHED_SUBCONTENT | Date | icc_published | Matches the published date of subcontent, for example, the published date of a comment. | |
| UPDATED | ICCModifiedDate | Date | icc_updated_primary | Matches the updated date of any content. (See PUBLISHED.) This field is available for all IBM Connections content.* |
| UPDATED_SUBCONTENT | Date | icc_updated | Matches the updated date of any subcontent. (See PUBLISHED.) This field is available for all IBM Connections content.* | |
| TAG | String | icc_tags | Matches the tags given to any content or subcontent of IBM Connections content. | |
| APPLICATION_NAME | ICCApplicationName | String | icc_application_name | Matches the IBM Connections Application name. Valid values are:
|
| RAW_CONTENT | String | $FULL_TEXT$ | Matches the complete content indexed for an IBM Connections document. Use this only for debugging purposes. | |
| AUTHOR_PRIMARY_ATTRIBUTE | ICCCreatedBy | String | This field is useful as a result list column. This field is necessary for IBM Connection Files because metadata is not part of content. |
|
| PUBLISHED_PRIMARY_ATTRIBUTE | ICCCreatedDate | String | This field is useful as a result list column. This field is necessary for IBM Connections Files because metadata is not part of content. |
* For all index fields that are suffixed with "primary," there is a secondary field definition for subcontent. These fields can be used to created combined or grouped fields in an eDiscovery Mananger search template to search for all authors, contributors, published, and modified entries.
IBM Connections Files content differs from other IBM Connections applications content because for Files, only the actual file content is text indexed and can be text-searched. Therefore, most of the collection fields listed previously are not applicable to IBM Connection Files. Because of this difference, it is recommended that you create two eDiscovery Manager search templates based on the custom collection definition above: One for IBM Connection Files that adds only search fields for the AUTHOR_PRIMARY_ATTRIBUTE, PUBLISHED_PRIMARY_ATTRIBUTE, APPLICATION_NAME and CONTENT fields, and one search template for the other IBM Connections applications content, which has all the other fields and does not use the AUTHOR_PRIMARY_ATTRIBUTE and PUBLISHED_PRIMARY_ATTRIBUTE fields or only uses them as a result list column where needed.
Limitations
- FileNet P8 with Verity based search (Legacy Search Service) is not supported for use with IBM Connections content.
- TIFF or PDF export of IBM Connections content is not supported.
- Dates that are indexed are truncated to a per hour resolution. This means that searches on a subhour resolution are not supported.
- Highlighting will only work for search terms entered for the content field.
- IBM Connections cannot provide all Information available in a browser through the application API. Specifically, IBM Content Collector will not archive the following information because the API does not provide the following information:
* Name of a wiki that contains a specific wiki page
* Name, number, or size of child pages of a wiki page
* Background tab of the profiles application
List of MIME types used by IBM Content Collector for archiving IBM Connections content
IBM Content Collector uses a specific set of MIME types to identify different types of content in an IBM Connections system. If you are working with IBM FileNet P8 and a CSS, the MIME types table will help you build advanced queries.
| Extension | MIME type | Description |
| .afu_acl_xml | Application/icc-acl-atom+xml | Member information of an IBM Connections document |
| .afu_activity_trash_xml | Application/icc-activity-trash-atom+xml | IBM Connections Activites trash document |
| .afu_activity_xml | Application/icc-activity-atom+xml | IBM Connections Activities document |
| .afu_attachment_xml | Application/icc-attachment-atom+xml | Attachment metadata of an IBM Connections document |
| .afu_blog_xml | Application/icc-blog-atom+xml | IBM Connections blog post document |
| .afu_board_xml | Application/icc-board-atom+xml | IBM Connections profile board document |
| .afu_bookmark_xml | Application/icc-bookmark-atom+xml | IBM Connections bookmark document |
| .afu_comment_xml | Application/icc-comment-atom+xml | Comments of an IBM Connections document |
| .afu_forum_reply_xml | Application/icc-forum-reply-atom+xml | Forum replies for an IBM Connections Forum topic |
| .afu_forum_topic_xml | Application/icc-forum-topic-atom+xml | IBM Connections forum topic |
| .afu_forum_xml | Application/icc-forum-atom+xml | IBM Connections forum metadata |
| .afu_link_xml | Application/icc-link-atom+xml | Links document of an IBM Connections profile |
| .afu_media_xml | Application/icc-media-atom+xml | Content of an IBM Connections wiki page |
| .afu_network_xml | Application/icc-network-atom+xml | Network document of an IBM Connections Profile |
| .afu_profile_xml | Application/icc-profile-atom+xml | IBM Connections Profile document |
| .afu_recommend_xml | Application/icc-recommend-atom+xml | Recommendation document of an IBM Connections document |
| .afu_reporting_xml | Application/icc-reportingChain-atom+xml | Reporting chain document of an IBM Connections Profile |
| .afu_status_xml | Application/icc-status-atom+xml | Status document of an IBM Connections Profile |
| .afu_tag_xml | Application/icc-tag-atom+xml | Tags given to an IBM Connections document |
| .afu_version_xml | Application/icc-version-atom+xml | Version metadata of an IBM Connections wiki page |
| .afu_wiki_xml | Application/icc-wiki-atom+xml | IBM Connections wiki page |
Rate this page:
Copyright and trademark information
IBM, the IBM logo and ibm.com are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.