DB2 10.5 for Linux, UNIX, and Windows

SYSTS_CREATE procedure - Create a text search index on a column

The SYSTS_CREATE procedure creates a text search index for a text column by issuing the DB2® Text Search CREATE INDEX command on the database server. After you create and update a text search index, you can search the column data by using text search functions.

The index does not contain any data until an index update operation occurs. An update operation occurs in one of the following ways:
  • You use the SYSTS_CREATE stored procedure.
  • You issue the DB2 Text Search UPDATE INDEX command.
  • The DB2 Administrative Task Scheduler issues the UPDATE INDEX command according to the update frequency that you defined for the index.

Authorization

The authorization ID of the SYSTS_CREATE stored procedure must hold the SYSTS_MGR role and CREATETAB authority on the database and one of the following items:
  • CONTROL privilege on the table on which the index will be defined
  • INDEX privilege on the table on which the index will be defined and one of the following items:
    • IMPLICIT_SCHEMA authority on the database, if the implicit or explicit schema name of the index does not exist
    • CREATEIN privilege on the schema, if the schema name of the index exists
  • DBADM authority

Default PUBLIC privilege

None

Syntax

Read syntax diagramSkip visual syntax diagram
>>-SYSTS_CREATE--(--index_schema--,--index_name--,-------------->

>--| text source |--,--| options |------------------------------>

>--,--message_locale--,--message--)----------------------------><

text source

|--+--------------+--table_name--(--| text column name |--)--,--|
   '-table_schema-'                                              

text column name

|--+-column_name-----------------+------------------------------|
   '-function_name (column_name)-'   

options

|--+-+------------------------------+----+----------------------|
   | '-| text default information |-'    |   
   +-+----------------------------+------+   
   | '-| update characteristics |-'      |   
   +-+---------------------+-------------+   
   | '-| storage options |-'             |   
   '-+---------------------------------+-'   
     '-| index configuration options |-'     

text default information

|--+---------------------+--+--------------------+-------------->
   '-CODEPAGE--code_page-'  '-LANGUAGE--language-'   

>--+----------------+-------------------------------------------|
   '-FORMAT--format-'   

update characteristics

|--+--------------------------------------------+--------------->
   '-UPDATE FREQUENCY--+-NONE-----------------+-'   
                       '-| update frequency |-'     

>--| incremental update characteristics |-----------------------|

update frequency

|--D-(-+-*------------+-)--H-(-+-*------------+-)--------------->
       | .-,--------. |        | .-,--------. |     
       | V          | |        | V          | |     
       '---integer1-+-'        '---integer2-+-'     

       .-,--------.     
       V          |     
>--M-(---integer3-+-)-------------------------------------------|

incremental update characteristics

|--+----------------------------+-------------------------------|
   '-UPDATE MINIMUM--minchanges-'   

storage options

|--+---------------------------------+-------------------------->
   '-COLLECTION DIRECTORY--directory-'   

>--+-------------------------------------------+----------------|
   '-ADMINISTRATION TABLES IN--tablespace_name-'   

index configuration options

|--+-----------------------------------------------+------------|
   |                        .-,----------------.   |   
   |                        V                  |   |   
   '-INDEX CONFIGURATION--(---| option value |-+-)-'   

option value

|--+----------------------------------------------+-------------|
   +-COMMENT--text -------------------------------+   
   +-UPDATEAUTOCOMMIT--+-+--------------------+-+-+   
   |                   | '-commitcount_number-' | |   
   |                   '-+------------+---------' |   
   |                     '-commitsize-'           |   
   +-COMMITTYPE--committype-----------------------+   
   +-COMMITCYCLES--commitcycles-------------------+   
   +-INITIALMODE--initialmode---------------------+   
   +-LOGTYPE--ltype-------------------------------+   
   +-AUXLOG--auxlog_value-------------------------+   
   '-CJKSEGMENTATION -cjksegmentation_method------'   

The schema is SYSPROC.

Procedure parameters

index_schema
An input argument of type VARCHAR(128) that specifies the schema of the text search index. The index_schema argument must comply with the naming restrictions for DB2 schema names. If the argument is null or an empty string, the default value of the schema is used. The index_schema argument is case-sensitive.
index_name
An input argument of type VARCHAR(128) that specifies the name of the index. Together with the index_schema argument, the index_name argument uniquely identifies a text search index in a database. The index_name argument is case-sensitive.
text source
An input argument of type VARCHAR(1024) that specifies the name of the column to index. The options are as follows:
table_schema
Specifies the schema of the table for which the text search index is to be created.
table_name
Specifies the name of the table that contains the text column. In DB2 Version 10.5 Fix Pack 1 and later fix packs, you can create a text search index on a nickname. You cannot create text search indexes on federated tables, materialized query tables, or views. The table_name argument is case-sensitive.
text column name
Specifies the name of the column to index.
column_name
Specifies the name of the column to index. The column must be of one of the following data types: CHAR, VARCHAR, CLOB, DBCLOB, BLOB, GRAPHIC, VARGRAPHIC, or XML. If the data type of the column is not one of these data types, you can use a transformation function with the name function_schema.function_name to convert the column type to one of the valid types. (For details, see the function_name(column_name) option description.) Alternatively, you can specify a user-defined external function that accesses the text documents to be indexed.

You can create only a single text search index for a column. The column_name argument is case-sensitive.

function_name(column_name)
Specifies the schema-qualified name of an external scalar function that converts a column data type that is unsupported for text searching into a data type that is supported for text searching. This function must take only one parameter and return only one value. The name of the function must conform to DB2 naming conventions. The function_name(column_name) option is case-sensitive.
options
An input argument of type VARCHAR(32K) that specifies the options to use. If you do not require any options, the parameter can be null or an empty string. The options are as follows:
CODEPAGE code_page
Specifies the DB2 code page to use when indexing text documents. The default code page value is taken from the SYSIBMTS.TSDEFAULTS view, from the row with the DEFAULTNAME column value of CODEPAGE.

This option applies only to binary data types. That is, the column type or return type from a transformation function must be BLOB or BIT DATA.

LANGUAGE language
Specifies the language that DB2 Text Search uses for language-specific processing of a document during indexing. If you do not specify a locale, the database territory is used to determine the default setting for the LANGUAGE option. To have your documents scanned to determine the locale, specify language as AUTO .
FORMAT format
Specifies the format of the text documents in the column. The supported formats are TEXT, XML, HTML, and INSO. If the column data type is not XML, the default format is taken from the SYSIBMTS.TSDEFAULTS view, from the row with the DEFAULTNAME column value of FORMAT. If the column data type is XML, the default format is always XML. If you want to use the INSO format, you must install rich text support.
FORMAT format
Specifies the format of the text documents in the column. The supported formats are TEXT, XML, HTML, and INSO. If the column data type is not XML, the default format is taken from the SYSIBMTS.TSDEFAULTS view, from the row with the DEFAULTNAME column value of FORMAT. If the column data type is XML, the default format is always XML. If you want to use the INSO format, you must install rich text support.
UPDATE FREQUENCY
Specifies the frequency of index updates. The index is updated if the number of changes is at least the value of the UPDATE MINIMUM option. Automatic updates can be done only if you set the DB2_ATS_ENABLE registry variable and use the START FOR TEXT command.

The default format is taken from the SYSIBMTS.TSDEFAULTS view, from the row with the DEFAULTNAME column value of UPDATEFREQUENCY.

NONE
No further index updates are made. This value can be useful for a text column in a table with data that does not change. It is also useful if you intend to manually update the index by using the UPDATE INDEX command.
D
The days of the week when the index is updated.
*
Every day of the week.
integer1
Specific days of the week, from Sunday - Saturday: 0 - 6.
H
The hours of the specified days when the index is updated.
*
Every hour of the day.
integer2
Specific hours of the day, from midnight to 11 p.m.: 0 - 23.
M
The minutes of the specified hours when the index is updated.
integer3
The top of the hour (0) , or 5-minute increments after the hour: 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, or 55.
UPDATE MINIMUM minchanges
Specifies the minimum number of changes to text documents before the index is updated incrementally at the time that is specified by the UPDATE FREQUENCY option. Only positive integer values are allowed. The default value is taken from the SYSIBMTS.TSDEFAULTS view, from the row with the DEFAULTNAME column value of UPDATEMINIMUM. A small value increases consistency between the table column and the text search index but also causes additional load on the system. The UPDATE INDEX command ignores the value of this option unless you specify the USING UPDATE MINIMUM option for that command.
COLLECTION DIRECTORY directory

Specifies the directory in which the text search index collection is stored. You must specify the absolute path, where the maximum length of the absolute path name is 215 characters. The process owner of the instance service of the text search server must have read and write access to this directory.

The COLLECTION DIRECTORY option is supported only for an integrated text search server setup. For additional information about collection locations, review the usage notes.

ADMINISTRATION TABLES IN table_space_name

Specifies the name of an existing non-temporary table space for the administration tables that are created for the index. By default, the table space of the base table for which you are creating the index is used.

This argument is required when you are creating a text index for a partitioned table or for tables in partitioned databases. For partitioned databases, the table space must be in the same partition group as the table space of the base table. The text index administration tables are distributed in the same manner as the corresponding base table.

INDEX CONFIGURATION (option_value)
Specifies additional index-related options as option-value pairs. Options and values are as follows:
Table 1. Option-value pairs
Option Value Data type Description
COMMENT text String value of fewer than 512 bytes Adds a string comment to the REMARKS column in the DB2 Text Search SYSIBMTS.TSINDEXES catalog view. The comment is also used as the description of the collection.
UPDATEAUTOCOMMIT commitsize String Specifies the number of rows or number of hours after which a commit is run to preserve the previous work for either initial or incremental updates.

If you specify the number of rows, after the number of updated documents reaches a COMMITCOUNT number, the server applies a commit. COMMITCOUNT counts the number of documents that are updated by using the primary key, not the number of staging table entries.

If you specify the number of hours, the indexed data is committed after the specified number of hours is reached. The maximum number of hours is 24.

For an initial update, the index update processes batches of documents from the base table. After the commitsize value is reached, update processing completes a commit operation, and the last processed key is saved in the staging table with the operational identifier 4. This key is used to restart update processing after a failure or after the completion of the specified number of commitcycles. If you specify a commitcycles value, the update mode is changed to incremental to initiate capturing changes by using the LOGTYPEBASIC option to create triggers on the text table. However, until the initial update is complete, log entries that were generated by documents that were not processed in a previous cycle are removed from the staging table.

Using the UPDATEAUTOCOMMIT option for an initial text index update significantly increases execution time.

For incremental updates, log entries that are processed are removed from the staging table with each interim commit.

COMMITTYPE committype String Specifies rows or hours for the UPDATEAUTOCOMMIT index configuration option. The default is rows.
COMMITCYCLES commitcycles Integer Specifies the number of commit cycles. The default is 0, meaning unlimited cycles.

If you do not specify the number of cycles, the update operation uses as many cycles as required to finish the update processing, based on the batch size that you specify for the UPDATEAUTOCOMMIT option.

You can use the COMMITCYCLES option with the UPDATEAUTOCOMMIT option with a committype value.

INITIALMODE initialmode String Specifies how the updates are processed. The possible values of the INITIALMODE index option are as follows:
FIRST
The first update is the default value of the INITIALMODE option.
SKIP
The update mode is immediately set to incremental, triggers are added for the LOGTYPEBASIC option, but no initial update is performed.
NOW
The update is started after the index is created as the final part of the CREATE INDEX command operation. This option is supported only for single-node setups.
LOGTYPE ltype String Specifies whether triggers are added to populate the primary log table. The values are as follows:
BASIC
The primary staging table is created, and triggers are created on the text table to recognize any changes. This is the default value for text search indexes on base tables. This option is not supported for nicknames.
CUSTOM
The primary staging table is created, but no triggers are created on the text table. Identify changes for incremental updates, especially if you do not plan to use the ALLROWS option for updates. The CUSTOM option is supported for nicknames.
Note: The default value of the LOGTYPE option is CUSTOM for text search indexes on nicknames.
AUXLOG auxlog_value String Controls the creation of the additional log infrastructure to capture changes that are not recognized by a trigger. The default setting for range-partitioned tables is ON. You can change the default value in the default table by setting AuxLogNorm for non-range-partitioned tables and AuxLogPart for range-partitioned tables.

For text search indexes on nicknames, only the OFF option can be enabled for the AUXLOG option to disable support for this index configuration option.

You cannot change the auxiliary log infrastructure property for a text index after you create it.

CJKSEGMENTATION cjksegmentation_method String Specifies the segmentation method that applies to documents that use the Chinese, Japanese, or Korean language (zh_CN, zh_TW, ja_JP, or ko_KR locale set), including such documents when automatic language detection is enabled (when you specify the LANGUAGE parameter with the AUTO option). Supported values are:
  • MORPHOLOGICAL
  • NGRAM
If you do not specify a value, the value stored in the SYSIBMTS.TSDEFAULTS view is used. Specifically, the value in the DEFAULTVALUE column of the row whose DEFAULTNAME value is 'CJKSEGMENTATION'.

The specified segmentation method is added to the SYSIBMTS.TSCONFIGURATION administrative view. You cannot change the method after creating the text index.

Important: You must enclose non-numeric values, such as comments, in single quotation marks. A single quotation mark character within a string value must be represented by two consecutive single quotation marks, as shown in the following example:
INDEX CONFIGURATION (COMMENT 'Index on User''s Guide column')
message_locale
An input argument of type VARCHAR(33) that specifies the locale to use for error messages. If the argument is null or an empty string or the message files for the specified locale are not available on the server, 'en_US' is used.
message
An output argument of type VARCHAR(32K) that specifies a warning or informational message for a successfully completed operation.

Examples

Example 1: In the following example, the SYSTS_CREATE procedure is called to create a text search index called myTextIndex. The UPDATE MINIMUM 10 parameter specifies that 10 changes must be made to the text documents that are associated with the index before an incremental update of the index is performed. Any error messages are returned in English. When the underlying text search command runs successfully, the message output parameter is set to indicate the status of the command execution.
  CALL SYSPROC.SYSTS_CREATE('db2ts', 'myTextIndex', 
   'myUserSchema.myBaseTable (myTextColumn)', 'UPDATE MINIMUM 10', 
   'en_US', ?)
Sample output is as follows:
  Value of output parameters   
  --------------------------   
  Parameter Name  : MESSAGE   
  Parameter Value : Operation completed successfully.    
  Return Status = 0    
Example 2: In the following example, the SYSTS_CREATE procedure is called to create a text search index called myTextIndex2. No options are specified. In this example, the index exists, which results in an error message being returned to the caller.
  CALL SYSPROC.SYSTS_CREATE('db2ts', 'myTextIndex2', 
   'myUserSchema.myBaseTable (myTextColumn)', '', 'en_US', ?)
Sample output is as follows:
  SQL20427N An error occurred during a text search administration 
  procedure or command. The error message is "CIE00201 Text search 
  index "db2ts"."myTextIndex2" already exists. ". 

Usage notes

The following key-related rules apply:
  • You must define a primary key for the table.
  • The maximum number of primary key columns is two columns fewer than the maximum number of primary key columns in other DB2 environments.
  • The maximum total length of all primary key columns for a table with DB2 Text Search indexes is 15 bytes fewer than the maximum total primary key length that is allowed in other DB2 environments. See the restrictions for the DB2 CREATE INDEX statement.

If you create an index with the LANGUAGE parameter set to the AUTO option, you can specify the CJKSEGMENTATION option.

You cannot issue certain procedures or commands concurrently for a text search index because they might result in an error, based on the timing of the conflicting operation:
  • The SYSTS_DISABLE procedure or db2ts DISABLE DATABASE FOR TEXT command
  • The SYSTS_CONFIGURE procedure
Successful execution of the CREATE INDEX command has the following effects:
  • DB2 Text Search server data is updated. A collection with the name instance_database-name_index_identifier_number is created, as in the following example:
    tigertail_MYTSDB_TS250517_0000
    For partitioned databases, a collection is created for each partition. You can retrieve the collection names from the COLLECTIONNAME column in the SYSIBMTS.TSCOLLECTIONNAMES view.
  • An index event table is created in the specified table space. Also, an index staging table is created in the specified table space with accurate DB2 indexes. If you specified the AUXLOG ON option for the INDEX CONFIGURATION parameter, a second staging table is created to capture changes through integrity processing.
  • DB2 Text Search catalog information is updated. However, the new text search index is not automatically populated. To populate the text search index, the SYSTS_UPDATE procedure or UPDATE INDEX command must be issued either manually or automatically (as a result of defining an update schedule for the index by using the UPDATE FREQUENCY parameter).
  • If DB2 Text Search coexists with DB2 Net Search Extender and an active Net Search Extender index exists for the table column, the new DB2 Text Search index is set to inactive.
  • If you create a text search index by setting the LANGUAGE parameter to AUTO and the CJKSEGMENTATION option to MORPHOLOGICAL, searches for valid strings on a morphological index return zero rows. To obtain the results, add the QUERYLANGUAGE option to the CONTAINS function, as shown in the following query:
    select bookname from morphobooks
     where contains (story, '军书','QUERYLANGUAGE=zh_CN') = 1

The AUXLOG option is supported for nicknames for data columns that support a materialized query table (MQT) with deferred refresh. It is not supported for views.

To create a text search index on a nickname, the nickname must be a non-relational flat file nickname. Non-relational XML nicknames are not supported.

For compatibility with an earlier version, you can specify the UPDATEAUTOCOMMIT index configuration option without type and cycles. This option is associated by default with the COMMITTYPE rows option and unrestricted cycles.

To override the configured values, you can specify the UPDATEAUTOCOMMIT, COMMITTYPE, and COMMITSIZE index configuration options for the UPDATE INDEX command. Values that you submit for a specific update operation are applied only once and are not persisted.

If you specify the INITIALMODE SKIP option, the text search index manager populates the index. Use this option to control the sequence in which data from the text table is initially processed.

The following rules apply to the LOGTYPE index configuration option:
  • If you use the LOGTYPE CUSTOM setting, use the SYSIBMTS.TSSTAGING administrative view to insert log entries for new, changed, and deleted documents.
  • To view the settings for an index, check the value of the LOGTYPE parameter in the SYSIBMTS.TSCONFIGURATION administrative view.
  • To view the default log type that is applied to new text indexes, check the value of theLOGTYPE option in the SYSIBMTS.TSDEFAULTS administrative view.
  • The LOGTYPE option is not valid with the ALLROWS option of the CREATE INDEX command because the ALLROWS option forces an initial update and no log tables are created.