Creating a Text Search Collection
This procedure creates an empty search collection. An SQL schema will be created on the system to contain information about the collection. The schema will contain DB2® objects necessary to track and index objects.
SYSPROC.SYSTS_CRTCOL and SYSPROC.SYSTS_CREATE_COLLECTION
Authorization
SYSTS_CRTCOL and SYSTS_CREATE_COLLECTION will be shipped with *EXECUTE authority granted to public.
These procedures will not adopt any additional authority and will run under the invoking profile.
- Authority to create a DB2 schema
- Authority/ability to create a text search index
The DB2 Objects created as part of the collection, including the administrative stored procedures are created with public authority *EXCLUDE
The user profile creating the collection owns all objects in the collection. A user may grant authority to specific procedure to other users in order to allow another user to administer and search the text search collection.
Syntax
Syntax
The schema qualifier is SYSPROC.
Parameters
- collection_name
- Name of Collection. This name uniquely identifies the collection
and must be a non null string. A schema of the name specified for
collection name will be created to hold the associated DB2 objects. Note: Enclose names in double quotation marks if the names conflict with SQL keywords or OmniFind keywords that can be used.
The collection name parameter will follow the SQL rules for schema names.
The collection name must not match the name of an existing user profile.
The data type of this parameter is VARCHAR(128).
- options
- A character string that specifies the various options that are
available for this stored procedure.
The data type for this parameter is VARCHAR(32000).
- text-default-information
- Specifies the language that is used when processing documents,
and the format of text documents in the column.
- LANGUAGE language
- Specifies the language that OmniFind Text Search Server for DB2 for i uses
for the linguistic processing of text documents. The default value
is en_US (English). If you specify the value for this parameter as
AUTO, OmniFind Text Search Server for DB2 for i tries
to determine the language. Important: If the language of the documents is not English, do not use the default value of en_US. Change the value to the language of the documents; otherwise, linguistic processing does not work as expected.
- FORMAT format
- Identifies the format of text documents to be indexes, such as
TEXT or INSO. The OmniFind Text Search Server for DB2 for i needs
to know the format, or content type, of the text documents that you
intend to index and search. If you do not specify the format parameter,
the default is TEXT.
The format value INSO allows OmniFind Text Search Server for DB2 for i to determine the format. If the OmniFind Text Search Server for DB2 for i cannot determine the document format, then a document error is logged in the job log during processing by the UPDATE stored procedure.
- update-characteristics
- Specifies the frequency of automatic updates to the text search
collection. The update process for a text search collection involves
both indexing the text data, and crawling system objects to detect
new or changed data.
- UPDATE FREQUENCY update-frequency
- Specifies when to make automatic updates to the text search collection.
The default value is NONE.
- NONE
- If NONE is specified, then no further index updates are made. The update must be started manually.
- D
- Specifies the day or days of the week when the index is updated. An asterisk (*) specifies all days. 0 specifies Sunday.
- H
- Specifies the hour or hours when the index is updated. An asterisk (*) specifies all hours.
- M
- Specifies the minute or minutes when the index is updated. An asterisk (*) cannot be specified. The minimum update frequency is 5 minutes.
Example: This example specifies that the index update is to run every 30 minutes.UPDATE FREQUENCY D(*) H(*) M(0,30)
The format of the update-frequency (chronological) option is a list of the five values separated by a blank space. The five values represent the minutes, hours, days of the month, months of the year, and days of the week beginning with Sunday.
If you specify an interval of values or an asterisk (*), you can specify a step value by using a forward slash (/) at the end of the defined interval.
Example: This example specifies that the index update is to run every quarter hour (
0,15,30,45
) on the even hours between 8 a.m. and 6:45 p.m. (8-18/2
is equivalent to8,10,12,14,16,18
), from Monday to Friday every month of the year (* * 1-5
).0,15,30,45 8-18/2 * * 1-5
- minute
- Specifies the minutes of the hour when the text search index is to be updated. You can specify an asterisk (*) for an interval of every 5 minutes, or you can specify an integer 0 - 59. You cannot repeat values. The minimum update frequency is 5 minutes. A value of 1,4,8 is not valid.
- hour
- Specifies the hours of the day when the text search index is to be updated. You can specify an asterisk (*) for every hour, or you can specify an integer 0 - 23. You cannot repeat values.
- dayOfMonth
- Specifies the days of the month when the text search index is to be updated. You can specify an asterisk (*) for every day, or you can specify an integer 1 - 31. You cannot repeat values.
- monthOfYear
- Specifies the months of the year when the text search index is to be updated. You can specify an asterisk (*) for every month, or you can specify an integer 1 - 12. You cannot repeat values.
- dayOfWeek
- Specifies the days of the week when the text search index is to be updated. You can specify an asterisk (*) for every day, or you can specify an integer 0 - 7. Both 0 and 7 are valid values for Sunday. You cannot repeat values.
- index-configuration-options
- Specifies additional index-specific values as option value pairs.
You must enclose string values in single quotation marks. A single
quotation mark within a string value must be represented by two consecutive
single quotation marks.
- CJKSEGMENTATION
- Specifies the segmentation method to use when you index documents for CJK (Chinese, Japanese, Korean) languages. The supported values are MORPHOLOGICAL and NGRAM. If the CJKSEGMENTATION value is not specified, the default value is used. The default value is specified by the DEFAULTNAME value in the QSYS2.SYSTEXTDEFAULTS table.
- SERVER
- Specifies the ID or alias name of the server to be used to store the text search index. If an ID is used, the value is an integer that must exist in the SERVERID column of the QSYS2.SYSTEXTSERVERS catalog. If an alias name is used, the value is a string that must exist in the ALIASNAME column of the QSYS2.SYSTEXTSERVERS catalog. If SERVER is not specified, the default is to select the server with the fewest text search indexes from the servers in the QSYS2.SYSTEXTSERVERS table where parameter SERVERSTATUS is set to 0 (zero), which means that the server is available.
- UPDATEAUTOCOMMIT
- Specifies how often a commit operation is performed when fetching
documents during an index update. A value of 0 (zero) means that a
commit operation occurs only at the end of processing.
The value must be an integer between 0 (zero) and 2147483647. The default value is 100.
Performance tip: The value of UPDATEAUTOCOMMIT can have a substantial impact on the performance of index updates. The commit operation that takes place at the specified interval ensures a consistent checkpoint from which to restart the index update, if it is interrupted. However, the commit also temporarily suspends the update process. Increasing the UPDATEAUTOCOMMIT value (or setting it to 0) can substantially improve the update performance, especially the initial update. The value you specify must balance the need for performance with the need for recoverability, based on the frequency of the index updates.
- asp_device_name
- This parameter, if specified and not null, determines which Auxiliary
storage pool the collection is created into. This parameter is optionally
available to match the CREATE SCHEMA capability to create the DB2 objects on a specific ASP device.
If a value is supplied for this parameter, the value must name a disk pool in the primary asp group of the current namespace, or a basic ASP unit if the namespace is the system ASP only.
The data type for this parameter is VARCHAR(10).
Examples
- CALL SYSTS_CRTCOL(‘mycollection');
A collection MYCOLLECTION is created.
- CALL SYSTS_CRTCOL(‘”mycollection”', “UPDATE FREQUENCY D(*)
H(*) M(0)”)
A collection mycollection (lower case not including the delimiters) is created.
The text search collection will have an update frequency of every day, at the top of every hour.
- CALL SYSTS_CRTCOL(‘ur_collection',‘UPDATE FREQUENCY NONE
‘ || ‘ LANGUAGE zh_CN INDEX CONFIGURATION( ‘ || ‘ CJKSEGMENTATION
MORPHOLOGICAL) ‘, ‘23')
A collection UR_COLLECTION is created.
The collection has no update frequency.
The collection's language is simplified Chinese, using dictionary linguistic analysis (morphological).
The SQL schema is created in basic ASP #23.
When the text search collection is created, procedures are created in the DB2 schema to administer the collection.
To create a collection from IBM® Navigator for i, follow these steps.
- From IBM Navigator for i, expand IBM i Management > System > All Tasks.
- On the right panel, select System > Omnifind > Create Collection.