DB2 Version 10.1 for Linux, UNIX, and Windows

SYSTS_CREATE procedure - Create a text search index on a column

The SYSTS_CREATE procedure creates a text search index for a text column by issuing the DB2® Text Search CREATE INDEX command on the database server. After you create and update a text search index, you can search the column data by using text search functions.

The index does not contain any data until an index update operation is processed. You can start the update operation by using the stored procedure interface or the command-line interface. For the latter, you can explicitly issue the DB2 Text Search UPDATE INDEX command, or the command is implicitly issued by the DB2 Administrative Task Scheduler, according to the update frequency defined for the index

Authorization

The authorization ID of the statement must include the SYSTS_MGR role and CREATETAB authority on the database and one of the following privileges or authority:
  • CONTROL privilege on the table on which the index will be defined
  • INDEX privilege on the table on which the index will be defined and one of the following authorities:
    • IMPLICIT_SCHEMA authority on the database, if the implicit or explicit schema name of the index does not exist
    • CREATEIN privilege on the schema, if the schema name of the index exists
  • DBADM authority

Default PUBLIC privilege

None

Syntax

Read syntax diagramSkip visual syntax diagram
>>-SYSTS_CREATE--(--index_schema--,--index_name--,-------------->

>--| text source |--,--| options |------------------------------>

>--,--message_locale--,--message--)----------------------------><

text source

|--+--------------+--table-name--(--| text column name |--)--,--|
   '-table-schema-'                                              

text column name

|--+-column-name-----------------+------------------------------|
   '-function-name (column-name)-'   

options

|--+-+------------------------------+----+----------------------|
   | '-| text default information |-'    |   
   +-+----------------------------+------+   
   | '-| update characteristics |-'      |   
   +-+---------------------+-------------+   
   | '-| storage options |-'             |   
   +-+---------------------------------+-+   
   | '-| index configuration options |-' |   
   '-+-----------------------+-----------'   
     '-| partition options |-'               

text default information

|--+---------------------+--+--------------------+-------------->
   '-CODEPAGE--code-page-'  '-LANGUAGE--language-'   

>--+----------------+-------------------------------------------|
   '-FORMAT--format-'   

update characteristics

|--+--------------------------------------------+--------------->
   '-UPDATE FREQUENCY--+-NONE-----------------+-'   
                       '-| update frequency |-'     

>--| incremental update characteristics |-----------------------|

update frequency

|--D-(-+-*------------+-)--H-(-+-*------------+-)--------------->
       | .-,--------. |        | .-,--------. |     
       | V          | |        | V          | |     
       '---integer1-+-'        '---integer2-+-'     

       .-,--------.     
       V          |     
>--M-(---integer3-+-)-------------------------------------------|

incremental update characteristics

|--+----------------------------+-------------------------------|
   '-UPDATE MINIMUM--minchanges-'   

storage options

|--+---------------------------------+-------------------------->
   '-COLLECTION DIRECTORY--directory-'   

>--+-------------------------------------------+----------------|
   '-ADMINISTRATION TABLES IN--tablespace-name-'   

index configuration options

|--+-----------------------------------------------+------------|
   |                        .-,----------------.   |   
   |                        V                  |   |   
   '-INDEX CONFIGURATION--(---| option-value |-+-)-'   

option-value

|--+----------------------------------------------+-------------|
   +-COMMENT--text -------------------------------+   
   +-UPDATEAUTOCOMMIT--+-+--------------------+-+-+   
   |                   | '-commitcount_number-' | |   
   |                   '-+------------+---------' |   
   |                     '-commitsize-'           |   
   +-COMMITTYPE--committype-----------------------+   
   +-COMMITCYCLES--commitcycles-------------------+   
   +-INITIALMODE--initialmode---------------------+   
   +-LOGTYPE--ltype-------------------------------+   
   +-AUXLOG--auxlog_value-------------------------+   
   '-CJKSEGMENTATION -cjksegmentation_method------'   

The schema is SYSPROC.

Procedure parameters

index_schema
An input argument of type VARCHAR(128) that specifies the schema of the text search index. The index_schema must comply with the naming restrictions for DB2 schema names. If the argument is null or an empty string, the default value of schema is used. The index_schema is case-sensitive.
index_name
An input argument of type VARCHAR(128) that specifies the name of the index. Together with index_schema, index_name uniquely identifies a text search index in a database. The index_name is case-sensitive.
text source
An input argument of type VARCHAR(1024) that specifies the name of the column to be indexed. The options are as follows:
table-schema
The schema of the table for which the text search index is to be created.
table-name
Specifies the name of the table that contains the text column. You cannot create text search indexes on federated tables, materialized query tables, or views. The table-name is case-sensitive.
text column name
Specifies the name of the column to be indexed.
column-name
Specifies the name of the column to be indexed. The column must be of one of the following data types: CHAR, VARCHAR, CLOB, DBCLOB, BLOB, GRAPHIC, VARGRAPHIC, or XML. If the data type of the column is not one of these data types, you can use a transformation function with the name function-schema.function-name to convert the column type to one of the valid types. (For details, see the function-name (column-name) parameter.) Alternatively, you can specify a user-defined external function that accesses the text documents to be indexed. You can create only a single text search index for a column. The column-name is case-sensitive.
function-name(column-name)
Specifies the schema-qualified name of an external scalar function that converts a column data type that is unsupported for text searching into a data type that is supported for text searching. This function must take only one parameter and return only one value. The name of the function must conform to DB2 naming conventions. The function-name(column-name) parameter is case-sensitive.
options
An input argument of type VARCHAR(32K) that specifies the options to be used. If you do not require any options, the parameter can be null or an empty string. The options are as follows:
CODEPAGE code-page
Specifies the DB2 code page to be used when indexing text documents. The default code page value is taken from the SYSIBMTS.TSDEFAULTS view, from the row with the DEFAULTNAME column value of CODEPAGE. The initial default code page for new indexes is the database code page. This parameter applies only to binary data types that is, the column type or return type from a transformation function must be BLOB or character-type FOR BIT DATA.
LANGUAGE language
Specifies the language to be used by DB2 Text Search for language-specific processing of a document during indexing. If you do not specify a locale, the database territory is used to determine the default setting for the LANGUAGE parameter. To have your documents that are scanned to determine the locale, specify locale as AUTO.
FORMAT format
Specifies the format of the text documents in the column. The supported formats are TEXT, XML, HTML, and INSO. If the column data type is not XML, the default format is taken from the SYSIBMTS.TSDEFAULTS view, from the row with the DEFAULTNAME column value of FORMAT. If the column data type is XML, the default format is always XML. If you want to use the INSO format, rich text support must be installed.
UPDATE FREQUENCY
Specifies the frequency of index updates. The index is updated if the number of changes is at least the value set for the UPDATE MINIMUM parameter. Automatic updates can only be done if the DB2_ATS_ENABLE registry variable is set and the START FOR TEXT command is issued.

The default format is taken from the SYSIBMTS.TSDEFAULTS view, from the row with the DEFAULTNAME column value of UPDATEFREQUENCY.

NONE
No further index updates are made. This value can be useful for a text column in a table with data that does not change. It is also useful if you intend to manually update the index by using the UPDATE INDEX command.
D
The days of the week when the index is updated.
*
Every day of the week.
integer1
Specific days of the week, from Sunday - Saturday: 0 - 6.
H
The hours of the specified days when the index is updated.
*
Every hour of the day.
integer2
Specific hours of the day, from midnight to 11 p.m.: 0 - 23.
M
The minutes of the specified hours when the index is updated.
integer3
UPDATE MINIMUM minchanges
Specifies the minimum number of changes to text documents before the index is updated incrementally at the time that is specified by the UPDATE FREQUENCY parameter. Only positive integer values are allowed. The default value is taken from the SYSIBMTS.TSDEFAULTS view, from the row with the DEFAULTNAME column value of UPDATEMINIMUM. A small value increases consistency between the table column and the text search index but also causes additional load on the system. The UPDATE INDEX command ignores the value of this parameter unless you specify the USING UPDATE MINIMUM parameter for that command.
COLLECTION DIRECTORY directory

Specifies the directory in which the text search index collection is stored. You must specify the absolute path, where the maximum length of the absolute path name is 215 characters. The process owner of the Text Search server instance service must have read and write access on this directory.

The COLLECTION DIRECTORY parameter is only supported for an integrated text search server setup. Review the usage notes for additional information about collection locations.

ADMINISTRATION TABLES IN table-space-name

Specifies the name of an existing non-temporary table space for the administration tables that are created for the index. By default the table space of the base table for which you are creating the index is used.

This argument is required when a text index is created for a partitioned table or for tables in partitioned databases. For partitioned databases, the table space must be in the same partition group as the table space of the base table. The text index administration tables are distributed in the same manner as the corresponding base table.

INDEX CONFIGURATION (option-value)
Specifies additional index-related options as option-value pairs. The following options are supported.
Table 1. Option-value pairs for the INDEX CONFIGURATION parameter
Option Value Data type Description
COMMENT text String value of fewer than 512 bytes Adds a string comment to the REMARKS column in the DB2 Text Search SYSIBMTS.TSINDEXES catalog view. The comment is also used as the description of the collection.
UPDATEAUTOCOMMIT commitsize -
  • This integer option specifies the number of index updates after which a commit is executed to automatically preserve the previous work for either initial or incremental updates.
    • For initial updates, the index update will process batches of documents from a base table after the trigger to capture data updates is activated. After the number of documents updated reaches the COMMITCOUNT number, the server does an interim commit. Log entries generated by unprocessed documents are removed from the staging table. Using the UPDATEAUTOCOMMIT parameter for an initial text index update leads to a significant increase of execution time.
    • For incremental updates, log entries which have been processed are removed correspondingly from the staging table with each interim commit. COMMITCOUNT counts the number of documents updated, not the number of staging table entries.
  • Starting in DB2 Version 10.1 Fix Pack 3 and later fix packs, this string option specifies the number of rows or number of hours after which a commit is run to automatically preserve the previous work for either initial or incremental updates.

    If you specify the number of rows:
    • After the number of documents that are updated reaches the COMMITCOUNT number, the server applies a commit. COMMITCOUNT counts the number of documents that are updated by using the primary key, not the number of staging table entries.
    If you specify the number of hours:
    • The text index is committed after the specified number of hours is reached. The maximum number of hours is 24.

    For initial updates, the index update processes batches of documents from the base table. After the commitsize value is reached, update processing completes a COMMIT operation and the last processed key is saved in the staging table with operational identifier '4'. This key is used to restart update processing either after a failure or after the number of specified commitcycles are completed. If a commitcycles is specified, the update mode is modified to incremental to initiate capturing changes by using the LOGTYPE BASIC option to create triggers on the text table. However, until the initial update is complete, log entries that are generated by documents that are not been processed in a previous cycle are removed from the staging table.

    Using the UPDATEAUTOCOMMIT option for an initial text index update leads to a significant increase of execution time.

    For incremental updates, log entries that are processed are removed correspondingly from the staging table with each interim commit.

COMMITTYPE committype String

Starting in DB2 Version 10.1 Fix Pack 3 and later fix packs, specifies rows or hours for the UPDATEAUTOCOMMIT index configuration option. The default is rows.

COMMITCYCLES commitcycles Integer

Starting in DB2 Version 10.1 Fix Pack 3 and later fix packs, specifies the number of commit cycles. The default is 0 for unlimited cycles.

If cycles are not explicitly specified, the update operation uses as many cycles as required based on the batch size that is specified with the UPDATEAUTOCOMMIT option to finish the update processing.

You can use this option with the UPDATEAUTOCOMMIT setting with a committype.

INITIALMODE initialmode String
Starting in DB2 Version 10.1 Fix Pack 3 and later fix packs, specifies how the updates are processed. The INITIALMODE index configuration option has three possible values:
  • FIRST, where the primary update is the initial update. This is the default value of the INITIALMODE option.
  • SKIP, where the update mode is immediately set to incremental, triggers are added for the LOGTYPE BASIC option, but no initial update is performed.
  • NOW, where the update is started after the index was created as final part of the CREATE INDEX operation. This option is only supported for single-node setups.
LOGTYPE ltype String
Starting in DB2 Version 10.1 Fix Pack 3 and later fix packs, specifies whether triggers are added to populate the primary log table. It has two values:
  • BASIC, where the primary staging table is created and triggers added on the text table to recognize any changes. This is the default value.
  • CUSTOM, where the primary staging table is created, but no triggers on the text table. Ensure a mechanism to identify changes for incremental updates, especially if you do not plan to use the ALLROWS option for updates.
AUXLOG auxlog_value String Controls the creation of the additional log infrastructure to capture changes that are not recognized by a trigger. The default setting for range-partitioned tables is ON. You can change the default value in the default table by setting AuxLogNorm for non-range-partitioned tables and AuxLogPart for range-partitioned tables.

You cannot change the auxiliary log infrastructure property for a text index after it is created.

CJKSEGMENTATION CJKSEGMENTATION_method String The segmentation method is applicable for documents in Chinese, Japanese, and Korean languages (zh_CN, zh_TW, ja_JP, ko_KR locale sets), including such documents when automatic language detection is enabled (LANGUAGE AUTO). If no option is specified, the value for CJKSEGMENTATION in the defaults table is applied. Supported values are:
  • MORPHOLOGICAL
  • NGRAM

The specified segmentation method is added to the SYSIBMTS. TSCONFIGURATION administrative view and cannot be changed after the text index is created.

Remember: Non-numeric values, such as comments, must be enclosed in single quotation marks. A single quotation mark character within a string value must be represented by two consecutive single quotation marks.
Example:
INDEX CONFIGURATION (COMMENT 'Index on User''s Guide column')
message_locale
An input argument of type VARCHAR(33) that specifies the locale to be used for any error message returned. If the argument is null or an empty string or the message files for the specified locale are not available on the server, 'en_US' is used.
message
An output argument of type VARCHAR(32K) that specifies a warning or informational message for a successfully completed operation.

Examples

Example 1: In the following example, the SYSTS_CREATE procedure is called to create a text search index called myTextIndex. The UPDATE MINIMUM 10 parameter specifies that 10 changes must be made to the text documents that are associated with the index before an incremental update of the index is performed. Any error messages are returned in English. When the underlying text search command runs successfully, the message output parameter is set to indicate the status of the command execution.
  CALL SYSPROC.SYSTS_CREATE('db2ts', 'myTextIndex', 
   'myUserSchema.myBaseTable (myTextColumn)', 'UPDATE MINIMUM 10', 
   'en_US', ?)
Sample output is as follows:
  Value of output parameters   
  --------------------------   
  Parameter Name  : MESSAGE   
  Parameter Value : Operation completed successfully.    
  Return Status = 0    
Example 2: In the following example, the SYSTS_CREATE procedure is called to create a text search index called myTextIndex2. No options are specified. In this example, the index exists, which results in an error message being returned to the caller.
  CALL SYSPROC.SYSTS_CREATE('db2ts', 'myTextIndex2', 
   'myUserSchema.myBaseTable (myTextColumn)', '', 'en_US', ?)
Sample output is as follows:
  SQL20427N An error occurred during a text search administration 
  procedure or command. The error message is "CIE00201 Text search 
  index "db2ts"."myTextIndex2" already exists. ". 

Usage notes

The following key-related rules apply:
  • You must define a primary key for the table.
  • The number of primary key columns is limited to two columns fewer than the number of primary key columns that are allowed in other DB2 environments.
  • The total length of all primary key columns for a table with DB2 Text Search indexes is limited to 15 bytes fewer than the maximum total primary key length that is allowed in other DB2 environments. See the restrictions for the DB2 CREATE INDEX statement.

Creating an index with the LANGUAGE parameter set to the AUTO option allows CJKSEGMENTATION specification as an option. The specified segmentation method applies to Chinese, Japanese, and Korean language documents. You cannot change the value that is set for the CJKSEGMENTATION_method value after index creation is complete.

You cannot issue certain procedures or commands concurrently on a text search index because they might result in an error message based on the timing of the conflicting operation.
  • SYSTS_DISABLE procedure or db2ts DISABLE DATABASE FOR TEXT command
  • SYSTS_CONFIGURE procedure
Successful execution of the CREATE INDEX command has the following effects:
  • DB2 Text Search server data is updated. A collection with the name instance_database-name_index-identifier_number is created, as in the following example:
    tigertail_MYTSDB_TS250517_0000
    For partitioned databases, a collection is created for each partition. You can retrieve the collection name(s) from the SYSIBMTS.TSCOLLECTIONNAMES view COLLECTIONNAME column.
  • An index event table is created in the specified table space. Also, an index staging table is created in the specified table space with appropriate DB2 indexes. If the AUXLOG ON option was set for the INDEX CONFIGURATION parameter, a second staging table is created to capture changes through integrity processing.
  • DB2 Text Search catalog information is updated. However, the newly created text search index is not automatically populated. To populate the text search index, the SYSTS_UPDATE procedure or UPDATE INDEX command must be issued either manually or automatically (as a result of defining an update schedule for the index by using the UPDATE FREQUENCY parameter).
  • If DB2 Text Search coexists with DB2 Net Search Extender and an active Net Search Extender index exists for the table column, the new DB2 Text Search index is set to inactive.
  • If a text search index is created with parameters LANGUAGE set to AUTO and CJKSEGMENTATION set to MORPHOLOGICAL, then searches for valid strings on a morphological index returns zero rows. To obtain the results, add the QUERYLANGUAGE option to the CONTAINS function, as seen in the following query:
    select bookname from morphobooks
     where contains (story, '军书','QUERYLANGUAGE=zh_CN') = 1

The AUXLOG option is supported for nicknames for data columns that support a multiple query table (MQT) with deferred refresh. It is not supported for views.

You can specify the UPDATEAUTOCOMMIT index configuration option without type and cycles for compatibility with an earlier version. This index configuration option is associated by default with the COMMITTYPE rows option and unrestricted cycles.

You can also specify the UPDATEAUTOCOMMIT, COMMITTYPE, and COMMITSIZE index configuration options for an UPDATE INDEX operation to override the configured values. Values that are submitted for a specific update operation are only applied once and not persisted.

With theINITIALMODE SKIP option, the text search index manager is responsible to populate the index appropriately. Use this option to control the sequence in which data from the text table is initially processed.

If you use the LOGTYPE index configuration option, the following rules apply:
  • When you use the LOGTYPE CUSTOM setting, use the SYSIBMTS.TSSTAGING administrative view to insert log entries for new, changed, and deleted documents.
  • Check the value of the LOGTYPE option in the SYSIBMTS.TSCONFIGURATION administrative view to view the settings for an index.
  • Check the value of the LOGTYPE option in the SYSIBMTS.TSDEFAULTS administrative view to view the default log type that is applied to newly created text indexes.
  • Note: The LOGTYPE option is not valid with the ALLROWS option in the CREATE INDEX operation, as ALLROWS forces an initial update and no log tables are created in this case.