Using IBM eDiscovery Manager Version 2.2 Fix Pack 4 with IBM Connections documents archived by IBM Content Collector Version 3.0 or above

Technote (FAQ)


Question

How can I use IBM® eDiscovery Manager Version 2.2 Fix Pack 4 with IBM Connections documents that are archived by IBM Content Collector Version 3.0 or later?

Answer

This technote describes how the newly added fixes for multi-part document search and export in IBM eDiscovery Manager, Version 2.2 Fix Pack 4 can be used to search IBM Connections documents that are archived by IBM Content Collector, Version 3.0 or later. It briefly explains the supported scenarios, necessary configuration steps, and content search options.

You can find more information about how IBM Connections documents that are archived by ICC V3.0 are represented in IBM Content Manager, Version 8 or FileNet P8 products in the documentation for ICC. Pay special attention to the ICC Indexing Guide which has detailed information on indexing and field sections.

With IBM Content Collector, you can archive all the content of an IBM Connections application. Indirectly related content will not be archived. An example of indirect content is a profile graphic that is shown in a blog comment of a blog post. The profile graphic is owned by the Profiles application and is archived when you archive the profiles. When you archive a blog post, the profile graphic is not archived because it is indirectly related content. Also, any custom widgets and content in IBM Connections pages are not archived.

Viewing IBM Connections documents

IBM Connections content will be rendered by the eDiscovery Manager viewer similarly to how page looked in the original IBM Connections context. However, because this is done only in the eDiscovery Manager viewer and IBM Connections is not required, there are some differences. The rendering mirrors the styling that you see IBM Connections, V3.0.1 with the following exceptions:

  • IBM Connections content that contains links to external locations will be highlighted with an icon in the eDiscovery Manager preview. Clicking a link will open a new window that navigates to the page.
  • Graphics in a page might not be displayed if the graphics source was external or is not yet archived.
  • Search terms will be highlighted in the IBM Connections document preview.
  • Dates are displayed as UTC with a UTC suffix.
  • If information cannot be retrieved for the archive, you will see "Unavailable." For example, the Background tab information of a Profile is not archived, you will see "Background (Unavailable)" displayed on that tab.

Exporting IBM Connections documents

eDiscover Manager supports exporting IBM Connections documents in native XML format or HTML format. The native XML format that is stored is based on open standards such as the ATOM feed protocol with specific IBM Connections extensions.

The HTML export is similar to the HTML that is generated for viewing except that no highlighting will be performed. PDF and TIFF export formats are not supported.

Searching IBM Connections documents


FileNet P8 specific search configuration

IBM Content Collector will create a default document class with the following name and properties:

Document class symbolic name:

  • ICCConnectionsInstance

Document properties:
  • DocumentTitle
  • ICCCreatedDate
  • ICCExpirationDate
  • ICCLastModifiedDate
  • ICCModifiedBy
  • ICCApplicationName
  • ICCCreatedBy

From eDiscovery Manager, to view and search IBM Connections content that is archived by IBM Content Collector and is stored in a Content Engine object store, create a new eDiscovery Manager custom collection type and use the following field mapping:
Collection field Content server property Type Text index Description
CONTENT String //icc_document Matches all of the content, including attachments of an IBM Connections application document.
DOCUMENT   String //icc_part Matches all of the content of an IBM Connections document except attachment plain text extract.
ATTACHMENT   String //icc_attachment Matches the plain text extract of a attachment.
ATTACHMENT_NAME   String //icc_attachment/@name Matches the display name of a attachment, as shown in IBM Connections.
TITLE DocumentTitle String //title Matches the title or name of an IBM Connections application content. For example, wiki page title, blog post title, profile title.
SUB_TITLE   String //subtitle Matches, if available, the activity goal of an IBM Connections activity.
ENTRY_ID   String //entry/id Matches the GUID set by IBM Connections to identify a specific portion of content, for example, a wiki page or blog post.
AUTHOR   String //author/name Matches the names of persons who created an IBM Connections application content or subcontent.
CONTRIBUTOR   String //contributor/name Matches the names of persons who change an IBM Connections application content or subcontent.
MODIFIER   String //modifier/name Matches names of persons who changed an IBM Connections wiki page. This is used instead of contributor by IBM Connections wiki page application.
CUSTOM_FIELD   String //field Matches values of custom fields for the IBM Connections applications that support custom fields (activities).
PUBLISHED   Date //published/@icc_date Matches the published date of any content or subcontent, for example, published date of a blog post or a specific comment to a blog post.

Remember:

- An IBM Connections profile does not have a published date and thus it is not searchable by an index; however, the ICCCreatedDate value will be set to the updated date of the profile.

- An IBM Connections activity that does not contain any subactivity does not have a published date and thus it is not searchable by an index; however, the ICCCreatedDate value will be set to the updated date of the activity.

UPDATED ICCLastModifiedDate Date //updated/@icc_date Matches the updated date of any content or subcontent (see PUBLISHED). This field is available for all IBM Connections content.
TAG   String //category/@term Matches the tags given to any content or subcontent of IBM Connections content.
APPLICATION_NAME ICCApplicationName String //icc_application_name Matches the IBM Connections application name. The valid values are:
    • FILES
    • PROFILES
    • WIKIS
    • BLOGS
    • BOOKMARKS
    • FORUMS
    • ACTIVITIES
RAW_CONTENT   String $FULL_TEXT$ Matches the complete content indexed for an IBM Connections document. Use this only for debugging purposes.
AUTHOR_PRIMARY_ATTRIBUTE ICCCreatedBy String   Names of persons who created an IBM Connections application content, for example, a blog post author, but not comment author. This is one of the several authors available under AUTHOR.

This field is useful as a result list column.

This field is necessary for IBM Connections Files because metadata is not part of content.

 

PUBLISHED_PRIMARY_ATTRIBUTE ICCCreatedDate String   Matches the published date of any content, for example, the published date of a blog post). This is one of the several dates available under PUBLISHED.

This field is useful as a result list column.

This field is necessary for IBM Connections Files because metadata is not part of content.

IBM Connections Files content differs from other IBM Connections applications content because for Files, only the actual file content is text-indexed and can be text-searched. Therefore, most of the collection fields listed previously are not applicable to IBM Connection Files. Because of this difference, it is recommended that you create two eDiscovery Manager search templates based on the custom collection definition above: One for IBM Connection Files that adds only search fields for the AUTHOR_PRIMARY_ATTRIBUTE, PUBLISHED_PRIMARY_ATTRIBUTE, APPLICATION_NAME and CONTENT fields, and one search template for the other IBM Connections applications content, which has all the other fields and does not use the AUTHOR_PRIMARY_ATTRIBUTE and PUBLISHED_PRIMARY_ATTRIBUTE fields or only uses them as a result list column where needed.

For applications other than Files, the complete XML content of an IBM Connections document is indexed. Therefore, more complex Xpath statements can be built to address specific subelements in the content.

An example for such a complex query would be to search just in the content of comments that are created specifically for IBM Connections content.

To configure a search field for such a use case, which is not covered with the default mapping provided above, perform the following steps:

  1. Look at a sample IBM Connections document in FileNet P8 by using FileNet Enterprise Manager.
  2. Identify the relevant content element (MIME type application/icc-comment-atom+xml for comments).
  3. Look at the content to identify the relevant XML structure (/feed/entry/content for comments)
  4. Derive the following expression from that build: //icc_part[@mimetype="application/icc-comment-atom+xml"]/feed/entry/content.
  5. See the table at the end of this document for a list of all MIME types defined by ICC for IBM Connections context.
  6. Create a new eDiscovery Manager Collection field that uses this index expression.

You can now search explicitly on IBM Connections comments.

XPath syntax supported in field mappings

The subset of XPath that is supported is defined by CSS XML search engines with XPath support. It differs from standard XPath in the following ways:

  • It does not support iteration and ranges in path expressions.
  • It eliminates filter expressions: that is, it allows filtering only in the predicate expression, not in the path expression.
  • It does not allow absolute path names in predicate expressions.
  • It implements only one axis (tag) and allows propagation only in the forward direction.

The following expressions are unsupported in the XML search syntax:
  • /*
  • //*
  • /@*
  • //@*

Disregarding of XML namespaces

Namespace prefixes are not retained in the indexing of XML tag and attribute names. You can search XML documents by using namespaces, but namespace prefixes are discarded during indexing and removed from XML search queries.

Numeric values

Predicates comparing attribute values to numbers are supported.

Complete match

The operator = (equal sign) with a string argument in a predicate means that a complete match of all tokens in the string with all tokens in the identified text span is required. The order of the tokens is important.

For more details about the XML search syntax, check the FileNet P8 documentation on "SQL Syntax Reference" and go to the "XML Search" section.

IBM Content Manager specific search configuration

IBM Content Collector will create a default item type with the following name and properties:

Item type name:

  • ICCConnections

Attributes:
  • ICCTitle
  • ICCCreatedDate
  • ICCExpirationDate
  • ICCLastModifiedDate
  • ICCModifiedBy
  • ICCApplicationName
  • ICCCreatedBy
  • ICMDeleteHold
  • ICCDEIHash

From eDiscovery Manager, to view and search IBM Connections content that is archived by IBM Content Collector and is stored in IBM Content Manager, create a new eDiscovery Manager collection of collection type "Custom Collection" and use the following field mapping:

Collection field Content server property Type Text index Description
CONTENT   String document Matches all of the content, including attachments of an IBM Connections application document.
DOCUMENT   String content Matches all of the content of an IBM Connections document except attachment plain text extract.
ATTACHMENT   String attachment Matches the plain text extract of an attachment.
ATTACHMENT_NAME   String attachment_name Matches the display name of an attachment, as shown in IBM Connections.
TITLE ICCTitle String icc_title Matches the title or name of an IBM Connections application content, for example, a wiki page title, blog post title, profile title.
ENTRY_ID   String icc_entry_id Matches the GUID set by IBM Connections to identify a specific portion of content, for example, a wiki page or blog post.
AUTHOR   String icc_displayName_author_primary Matches names of persons who created an IBM Connections application content (E.g. blog post author, but not comment author).*
AUTHOR_SUBCONTENT   String icc_displayName_author Matches names of persons who created an IBM Connections application subcontent, for example, comment author.*
CONTRIBUTOR   String icc_displayName_contributor_primary Matches names of persons who modified an IBM Connections application content, for example, a blog post.*
CONTRIBUTOR_SUBCONTENT   String icc_displayName_contributor Matches names of persons who modified an IBM Connections application content, for example, a blog post.*
CUSTOM_FIELD   String icc_customTextValue Matches values of custom fields for the IBM Connections applications that support custom fields (activities).
PUBLISHED   Date icc_published_primary Matches the published date of any content, for example, the published date of a Blog Post)

Remember:

- An IBM Connections profile does not have a published date and thus it is not searchable by an index; however, the ICCCreatedDate value will be set to the updated date of the profile.

- An IBM Connections activity that does not contain any subactivity does not have a published date and thus it is not searchable by an index; however, the ICCCreatedDate value will be set to the updated date of the activity.*

PUBLISHED_SUBCONTENT   Date icc_published Matches the published date of subcontent, for example, the published date of a comment.
UPDATED ICCModifiedDate Date icc_updated_primary Matches the updated date of any content. (See PUBLISHED.) This field is available for all IBM Connections content.*
UPDATED_SUBCONTENT   Date icc_updated Matches the updated date of any subcontent. (See PUBLISHED.) This field is available for all IBM Connections content.*
TAG   String icc_tags Matches the tags given to any content or subcontent of IBM Connections content.
APPLICATION_NAME ICCApplicationName String icc_application_name Matches the IBM Connections Application name. Valid values are:
    • FILES
    • PROFILES
    • WIKIS
    • BLOGS
    • BOOKMARKS
    • FORUMS
    • ACTIVITIES
RAW_CONTENT   String $FULL_TEXT$ Matches the complete content indexed for an IBM Connections document. Use this only for debugging purposes.
AUTHOR_PRIMARY_ATTRIBUTE ICCCreatedBy String   This field is useful as a result list column.

This field is necessary for IBM Connection Files because metadata is not part of content.

PUBLISHED_PRIMARY_ATTRIBUTE ICCCreatedDate String   This field is useful as a result list column.

This field is necessary for IBM Connections Files because metadata is not part of content.

* For all index fields that are suffixed with "primary," there is a secondary field definition for subcontent. These fields can be used to created combined or grouped fields in an eDiscovery Mananger search template to search for all authors, contributors, published, and modified entries.

IBM Connections Files content differs from other IBM Connections applications content because for Files, only the actual file content is text indexed and can be text-searched. Therefore, most of the collection fields listed previously are not applicable to IBM Connection Files. Because of this difference, it is recommended that you create two eDiscovery Manager search templates based on the custom collection definition above: One for IBM Connection Files that adds only search fields for the AUTHOR_PRIMARY_ATTRIBUTE, PUBLISHED_PRIMARY_ATTRIBUTE, APPLICATION_NAME and CONTENT fields, and one search template for the other IBM Connections applications content, which has all the other fields and does not use the AUTHOR_PRIMARY_ATTRIBUTE and PUBLISHED_PRIMARY_ATTRIBUTE fields or only uses them as a result list column where needed.

Limitations

  • FileNet P8 with Verity based search (Legacy Search Service) is not supported for use with IBM Connections content.
  • TIFF or PDF export of IBM Connections content is not supported.
  • Dates that are indexed are truncated to a per hour resolution. This means that searches on a subhour resolution are not supported.
  • Highlighting will only work for search terms entered for the content field.
  • IBM Connections cannot provide all Information available in a browser through the application API. Specifically, IBM Content Collector will not archive the following information because the API does not provide the following information:
    * Name of a wiki that contains a specific wiki page
    * Name, number, or size of child pages of a wiki page
    * Background tab of the profiles application

List of MIME types used by IBM Content Collector for archiving IBM Connections content

IBM Content Collector uses a specific set of MIME types to identify different types of content in an IBM Connections system. If you are working with IBM FileNet P8 and a CSS, the MIME types table will help you build advanced queries.

Extension MIME type Description
.afu_acl_xml Application/icc-acl-atom+xml Member information of an IBM Connections document
.afu_activity_trash_xml Application/icc-activity-trash-atom+xml IBM Connections Activites trash document
.afu_activity_xml Application/icc-activity-atom+xml IBM Connections Activities document
.afu_attachment_xml Application/icc-attachment-atom+xml Attachment metadata of an IBM Connections document
.afu_blog_xml Application/icc-blog-atom+xml IBM Connections blog post document
.afu_board_xml Application/icc-board-atom+xml IBM Connections profile board document
.afu_bookmark_xml Application/icc-bookmark-atom+xml IBM Connections bookmark document
.afu_comment_xml Application/icc-comment-atom+xml Comments of an IBM Connections document
.afu_forum_reply_xml Application/icc-forum-reply-atom+xml Forum replies for an IBM Connections Forum topic
.afu_forum_topic_xml Application/icc-forum-topic-atom+xml IBM Connections forum topic
.afu_forum_xml Application/icc-forum-atom+xml IBM Connections forum metadata
.afu_link_xml Application/icc-link-atom+xml Links document of an IBM Connections profile
.afu_media_xml Application/icc-media-atom+xml Content of an IBM Connections wiki page
.afu_network_xml Application/icc-network-atom+xml Network document of an IBM Connections Profile
.afu_profile_xml Application/icc-profile-atom+xml IBM Connections Profile document
.afu_recommend_xml Application/icc-recommend-atom+xml Recommendation document of an IBM Connections document
.afu_reporting_xml Application/icc-reportingChain-atom+xml Reporting chain document of an IBM Connections Profile
.afu_status_xml Application/icc-status-atom+xml Status document of an IBM Connections Profile
.afu_tag_xml Application/icc-tag-atom+xml Tags given to an IBM Connections document
.afu_version_xml Application/icc-version-atom+xml Version metadata of an IBM Connections wiki page
.afu_wiki_xml Application/icc-wiki-atom+xml IBM Connections wiki page

 

Rate this page:

(0 users)Average rating

Document information


More support for:

eDiscovery Manager

Software version:

2.2.0.4

Operating system(s):

AIX 64bit, Windows 2003 server, Windows 2008 server

Software edition:

All Editions

Reference #:

1595873

Modified date:

2013-04-17

Translate my page

Machine Translation

Content navigation