Documentation updates for IBM Content Analytics with Enterprise Search Version 3.0
Preventive Service Planning
This document summarizes changes and corrections to the IBM Content Analytics with Enterprise Search Version 3.0 information center.
This section contains changes or corrections to the product overview information.
Languages supported for synonym analysis
The description of sentiment analysis in the What's New in Version 3.0 topic incorrectly states that the supported languages are English, Dutch, French, German, Japanese, and Spanish. Only English and Japanese are supported, as stated in Configuring sentiment analysis for content analytics collections.
This section contains changes or corrections to the installation information.
Installing the product
The installation procedures instruct you to click Install Product to start the installation program. Actually, you must click a button to launch the installation program after you click Install Product.
Installation and data directories
This topic does not make it clear that high speed local storage devices are preferred. If you prefer network storage, then for performance and data integrity, use storage area network (SAN) devices that are not shared between IBM Content Analytics with Enterprise Search servers. If network storage must be shared between IBM Content Analytics with Enterprise Search servers to save storage space, the IBM General Parallel File System (GPFS) data storage architecture, although potentially slower, is also supported.
The following issues can occur if you install the product on a network file system (NFS) architecture:
- Data integrity issues. Data might not always be saved correctly because of occasional network errors.
- Performance issues. NFS has network protocol overhead and latency is high. These issues can negatively affect performance, especially search performance.
The reference to Windows 2003 in this topic is incorrect. The correct Windows version is Windows 2008.
It might take a few minutes for the installation program to finish installing the software after you press Enter at the end of the installation prompts. Do not end the program or suspend it until user control is returned so that additional commands can be entered..
Administrator ID and password: Non-root users
When you install the product as a non-root user, the installation program warns you that this approach has limitations. If you cannot accept the limitations that are documented in this topic, click Quit to exit the installation program. If the limitations are acceptable and you want to continue installing the product with a non-root user ID, click Ignore.
Administrator ID and password: Special character restrictions in user IDs
If you use an existing user ID for the administrator ID, the ID can contain letters, digits, and the underscore character. The ID cannot contain other special characters and the ID must begin with a letter. The ID cannot contain characters from the double byte character set (DBCS). Only ASCII characters are supported.
If you use a Windows domain ID for the administrator ID, the ID can contain letters, digits, the @ character, and the . (period) character. The ID cannot contain other special characters and the ID must begin with a letter. The ID cannot contain characters from the double byte character set (DBCS). Only ASCII characters are supported.
Administrator ID and password: Windows domain IDs
The Windows domain IDs section of this topic contains incorrect information about the differences between a local ID, a domain ID with a local profile, and a domain ID with a roaming profile. Here is the correct information:
Windows domain IDs
If you want to use a Windows domain user account for the default IBM Content Analytics with Enterprise Search administrator, you must create the domain ID in advance. The default application administrator ID must be either a local ID or a domain ID with a local profile. A domain ID with a roaming profile is not supported.
When you install the product, specify the existing domain ID as the default administrator ID in the following format:
Local ID or domain ID with a local profile
For a local user ID or a domain ID with a local profile, the user's local profile is stored on the local computer. Any changes made to the local user profile are specific to the computer on which the changes are made. These are the only types of user IDs that can be used as the default administrator ID.
To obtain domain privileges for an ID, you can add the local user ID that you use for the administrator ID to a domain. If you add the local user ID to a domain, however, you must ensure that the domain security rights do not override the local user rights that are required by IBM Content Analytics with Enterprise Search (which are listed later in this topic).
Domain ID with a roaming profile
For a domain ID with a roaming profile, a copy of the user's local profile is stored on a shared server. This shared profile, which is known as a roaming user profile, is downloaded whenever the user logs on to any computer on the network. Changes made to the profile are synchronized with the server copy when the user logs off. The default administrator ID cannot be this type of user ID.
Installing the system to use WebSphere Application Server Network Deployment
Base release of IBM Content Analytics with Enterprise Search Version 3.0:
If you plan to use WebSphere Application Server Network Deployment as the application server, you must run the following command to install the product and run applications in a non-cluster environment. Neither vertical clustering nor horizontal clustering is supported in the base release of IBM Content Analytics with Enterprise Search Version 3.0.
AIX or Linux: ./install.bin -D\$WAS_CLUSTERING_ENABLED\$=false
Windows: install.exe -D$WAS_CLUSTERING_ENABLED$=false
IBM Content Analytics with Enterprise Search Version 3.0 Fix Pack 1:
Beginning with this fix pack, cluster deployment for horizontal clustering is supported, where each clustering node is also configured as a search server. If you specify WebSphere Application Server Network Deployment as the application server when you run the installation program, the applications are deployed on a cluster server by default. If you do not want to install the applications on a cluster server, use the preceding command to run the IBM Content Analytics with Enterprise Search installation program.
- The backend processes of Content Analytics with Enterprise Search do not run on WebSphere Application Server.
- The processes do not utilize the clustering features of Network Deployment, such as Balancing workloads, High availability, or Deployment managers.
- The decision to use the clustering mode of WebSphere Application Server Network Deployment must be specified when you install the base release of IBM Content Analytics with Enterprise Search Version 3.0 and cannot be changed when you install Fix Pack 1.
- If you install IBM Content Analytics with Enterprise Search as a distributed server system (not all-on-one), the Search and Analytics customizer programs fail with a "not found" error. A workaround is available for this problem. See http://www.ibm.com/support/docview.wss?uid=swg21606987.
Upgrading to Version 3.0
For complete information about upgrading to Version 3.0, including information about data that is or is not migrated automatically, see Upgrading to IBM Content Analytics with Enterprise Search Version 3.0.
This section contains changes or corrections to the administration information.
Hovering over fields and icons to see system status
Most of the fields and icons on the administration console dashboards (the Collections view, System view, and Security view) provide assistance that you can see by positioning your cursor to hover over the field or icon. In some cases, it might not be obvious that assistance, including system status information, is available as hover help. For example, you can check the progress of index builds by hovering over the Rebuild index fields. The following sample screen shows how the start time and end time for an index build is shown when you hover over the field:
Rules to expand queries and rank documents
The ability to upload custom analyzers and associate analyzers with index fields is disabled by default. To learn how to enable these functions, and to learn about configuring custom rules to automatically expand queries, see Expanding queries and influencing how documents are ranked in the results.
To help clarify how the different crawler options affect which files are crawled, consider these examples:
The crawler is scheduled to crawl new and modified documents only:
A file is added to the crawl space on 16 April 2012. The timestamp of the file is 01 January 2012. The crawler is scheduled to start crawling on 01 January 2012. The crawler will not include this newly added file because the crawler compares the crawler schedule time to the timestamp of the file.
The crawler is scheduled to crawl new, modified, and deleted documents:
Assume the same environment as the preceding scenario, except that the last time the scheduled crawl occurred was 16 April 2012 and the file was added after the crawler began crawling. The next time the crawler begins crawling, the newly added file will be included because the crawler compares the crawler schedule time to the time that the crawler last ran.
URI format for documents crawled by the Agent for Windows File Systems crawler
This topic states that the format for documents crawled by the Agent for Windows File Systems crawler is file:////fileserver.ibm.com/directory/file.doc. That information is incorrect. The correct URI format is winfs://server.ibm.com/c:/directory/file.doc.
Configuring support for Data Listener applications
This topic instructs you to use the search and index (SIAPI) APIs instead of the Data Listener APIs to develop applications. However, the SIAPI Administration APIs are deprecated. Use the REST APIs for your data listener applications. For more information about using the REST APIs, see the API documentation in the ES_INSTALL_ROOT/docs/api/rest directory. Sample scenarios that demonstrate how to perform administrative and search tasks are available in the ES_INSTALL_ROOT/samples/rest directory.
Index field attributes
The sample queries provided for Exact Match and Case-Sensitivity incorrectly use the equal sign instead of a colon to separate the field and value to be searched. The correct sample queries are:
color:"dark blue skirt"
Configuring index fields for field values that are published in an IBM Connections seed list
When you create index fields from a seed list, and the seed list field value attribute name has a space in it, you must replace the space with an underscore character. In the following example, use Forum_UUID, not Forum UUID:
<wplc:fieldInfo id="FORUM_UUID" name="Forum UUID" description="Forum UUID"
Configuring document flags
This topic states that after you create flags, your changes become effective immediately. However, for an enterprise search collection, you must recrawl and reparse documents and then rebuild the index. In addition, to see new or changed flags, users must log out of the application and log back in. If logging in is not required, then the application must be refreshed in the browser.
Customizing URLs by using a regular expression URL filter
Step 6 in this procedure instructs you to specify the path to a file that you created in Step 5. However, the example shows the path to a folder: JVMOptions=-DRegexFilterFilename=C:\IBM\temp. A more accurate file path example is JVMOptions=-DRegexFilterFilename=C:\IBM\temp\filter.txt.
Custom filters in application configuration files
Disregard the Custom filters section of this topic. The ability to configure queries to filter results was supported for classic search applications in OmniFind Enterprise Edition, but you cannot configure custom filters for search applications.
Default preferences in application configuration files
The preferences.defaultSearchType=facet property, which is mentioned under the Default Preferences section of this topic, does not exist. Instead, use the preferences.enableFacetedSearch property. Set the property to true to allow users to search a single collection or false to allow users to search multiple collections at a time.
Duplicate document detection
Disregard the first bullet in this topic, which states that duplicate document analysis is available only for collections that use the link-based ranking model. That limitation applied to previous releases of the product, but is no longer valid in IBM Content Analytics with Enterprise Search v3.0.
Wildcard characters in queries
When you configure wildcard character options, the only option that you can specify is how many variations of the query term qualify as a match. Disregard the statements in this topic about choosing how wildcard support is to be provided. Wildcard characters can occur anywhere in the query term. To limit the wildcard to the final character in a query term, use the % wildcard symbol (for example, ab% returns aba, abb, abc, and so on).
Configuring type ahead support
Step 1 of this procedure instructs you to click the Actions menu to locate the type ahead options. The correct instruction is to click Configure (the pencil icon) and then select Type ahead options.
Default paths for the config.properties file
Several topics in the information center show an incorrect path for the config.properties file, which you can edit to customize enterprise search applications and the content analytics miner. The correct paths are ES_NODE_ROOT/master_config/searchapp/search/ and ES_NODE_ROOT/master_config/searchapp/analytics/. This change affects several topics, including:
This section discusses changes and corrections to the security information.
To learn about single sign-on support in IBM Content Analytics with Enterprise Search, see Configuring support for SSO authentication.
Indexing Lotus Notes document-level security to improve search performance
Disregard the information in this topic. The steps were required to improve search performance in previous versions of IBM Content Analytics with Enterprise Search, but are not needed or applicable in Version 3.0.
This section contains changes or corrections to the integration information.
No need to install IBM Cognos SDK
The procedures for integrating with IBM Cognos Business Intelligence instruct you to install the IBM Cognos SDK on the IBM Content Analytics with Enterprise Search master server. Disregard these instructions. The SDK is automatically installed when you install IBM Content Analytics with Enterprise Search, and thus there is no need to explicitly install it. Skip the following steps:
- Step 2 in Generating IBM Cognos BI Reports
- Step 2b in Exporting documents to IBM Cognos BI
Exporting crawled or analyzed documents
The Exporting to IBM DB2 section instructed you to install the DB2 client on the crawler server and did not mention required jar files. The correct information is:
If you plan to export documents to an IBM DB2 database, you must install the DB2 Client on the IBM Content Analytics with Enterprise Search server. In a distributed installation, install the DB2 Client on the master server. For the configuration, specify appropriate jar files such as db2jcc.jar and db2jcc_license_cu.jar installed with the DB2 Client.
Relational database mappings for documents exported from enterprise search collections
The path for the sample mapping file, ES_INSTELL_ROOT/default_config/export_rdb_mapping.xml, contains a typographical error. ES_INSTELL_ROOT should be ES_INSTALL_ROOT.
Integrating with IBM InfoSphere BigInsights
If indexing activity appears stalled in the administration console, use the BigInsights administration console to check the document processing and indexing status. Until indexing is complete, the status is not relayed to the IBM Content Analytics with Enterprise Search administration console from the BigInsights server.
Importing CSV files
Data is imported successfully if the content follows the CSV file format regardless of the file extension (.csv, .dat, .text, .txt, and so on). When you run the CSV file import wizard, you can verify that the format of the data is correct by previewing the content before you import it.
Integrating with WebSphere Portal
For complete information about how to run an enterprise search application as a portlet and how to integrate enterprise search technology into your WebSphere Portal environment, see WebSphere Portal integration with IBM Content Analytics with Enterprise Search.
This section contains changes or corrections to the text analytics information.
This topic states that the system removes Katakana middle dots, which are used as compound word delimiters in Japanese. This behavior changed in Version 3.0. In a fresh installation, the system no longer automatically removes Katakana middle dots. In an upgrade installation, however, for compatibility with previously normalized data, Katakana middle dots continue to be removed during character normalization.
Synonym, stop word, and boost word dictionaries are always case sensitive
Disregard the example of how to create a case-insensitive dictionary. The -lc parameter was dropped in IBM Content Analytics with Enterprise Search V3.0. Synonym dictionaries, stop word dictionaries, and boost word dictionaries are always case sensitive.
There are no changes or corrections to the maintenance information.
There are no changes or corrections to the analyzing content information.
This section contains changes or corrections to the application development information.
Creating and deploying a plug-in to add custom panes in the enterprise search application
The values shown for the paneN.mode parameter are incorrect. The modes of the application for which to display the custom pane are:
search: to show the pane in an enterprise search application
analytics: to show the pane in an enterprise search application with analytics mode enabled
A third value, textminer, cannot be specified because the content analytics miner cannot be customized.
Creating a postparse plug-in for the web crawler
More detailed steps about creating a plug-in for the web crawler are provided in Creating a prefetch plug-in for the web crawler. See that topic for more explicit steps regarding inheriting the plug-in interface, using the ES_INSTALL_ROOT/lib/URLFetcher.jar file for name resolution, and compiling your code into a JAR file.
This section contains changes or corrections to the application programming samples.
Sample plug-ins and custom panes in the enterprise search application
Steps are missing from this procedure.
After Step 1, do Step 1a:
Copy the contents of the ES_INSTALL_ROOT/samples/analyticsViewPlugin/view directory into the ES_NODE_ROOT/master_config/searchapp/search/plugin/view directory. You must create the plugin/view directory if it does not already exist.
After Step 3, do Step 3a:
Copy the wplugin directory.
- If you use the embedded web application server:
Copy the contents of the ES_INSTALL_ROOT/webapps/searchapp/search/wplugin directory into ES_INSTALL_ROOT/webapps/adminapp/search/wplugin directory. You must create the wplugin directory if it does not already exist.
- If you use WebSphere Application Server:
Confirm that the ES_INSTALL_ROOT/installedApps/search.ear/search.war/wplugin directory exists.
There are no changes or corrections to the reference information.
This section contains changes and corrections to the troubleshooting information.
Script error when launching the content analytics miner in Firefox
When you start the content analytics miner, you might see an error message that states "Warning: Unresponsive script," indicating that a script is busy or unresponsive. To get past this dialog, click Continue.
1. Type about:config in the Firefox address bar.
2. Filter down to the values for dom.max_script_run_time and dom.max_chrome_script_run_time.
3. Change the values to something higher than the default (which is 10 seconds).
Setting the value to 0 allows the script to run for as long as it needs. Specifying a large number might be more appropriate, however, to avoid locking the user interface forever as the script runs.
This section contains changes and corrections to the help system. You can access the online help from the Help or Learn More link on any page in the administration console, content analytics miner, or enterprise search application.
Maximum document length
The maximum document length is 128K characters for content analytics collections and 16M characters for enterprise search collections. The document length contains all text, including fields and some space that is required for Content Analytics with Enterprise Search system data. Because this limit is in the parser layer, it applies to parsed (extracted) text and applies to all documents. This information was omitted from the description of document lengths in the help files for crawler properties in the administration console.
Edit Collection Settings
When you edit settings for a content analytics collection, you can specify which fields you want to use for the Date facet. Temporal views, such as Time Series, Trends, and Deviations, use the Date facet to show time and date values along the X-axis. The inline instruction text in the administration console does not make it clear that the fields that can be used for the Date facet must be index fields.