SYSPROC.SYSTS_CREATE
You can call the SYSPROC.SYSTS_CREATE stored procedure to create a text search index. This stored procedure enables a text column for text search indexing. The text search index can then be used in SQL queries that contain the CONTAINS or SCORE functions.
Prerequisites
- DB2 text search functions were started by invoking the SYSPROC.SYSTS_START stored procedure and at least one text search server is running.
- The table includes a column that is defined as primary key, unique index, or ROWID.
- The QSYS2.SYSTEXTSERVERS table contains at least one entry.
Authorization
- The privilege to create in the schema. For more information, see Authorization, privileges and object ownership.
- Administrative authority
The privileges held by the authorization ID of the statement must include at least one of the following privileges:
- The following system authorities:
- *USE to the Create Logical File (CRTLF) command
- *CHANGE to the data dictionary if the library into which the text index is created is an SQL schema with a data dictionary
- Administrative authority
The privileges held by the authorization ID of the statement must also include at least one of the following privileges:
- For the referenced table:
- The INDEX privilege on the table
- The system authority *EXECUTE on the library containing the table
- Administrative authority
- If SQL names are specified, and a user profile exists that has
the same name as the library into which the text index is created,
and that name is different from the authorization ID of the statement,
then the privileges held by the authorization ID of the statement
must include at least one of the following privileges:
- The system authority *ADD to the user profile with that name
- Administrative authority
If a distinct type is referenced, the privileges held by the authorization ID of the statement must include at least one of the following privileges:
- For each distinct type identified in the statement:
- The USAGE privilege on the distinct type, and
- The system authority *EXECUTE on the library containing the distinct type
- Administrative authority
If the column that the text search index is built over contains a field procedure, the authorization ID must have one of
- *EXECUTE authority to the program and
- *EXECUTE authority on the library containing the program
Or
- Administrative authority
For information about the system authorities corresponding to SQL privileges, see GRANT (Table or View Privileges).
Syntax
The schema qualifier is SYSPROC.
Parameters
- indexSchema
- Identifies the schema of the text search index.
If this parameter is null, the value of the CURRENT SCHEMA special
register for the invoker is used. This value must be a valid SQL name. Note: Enclose names in double quotation marks if the names conflict with SQL keywords or OmniFind keywords that can be used.
The data type of this parameter is VARCHAR(128).
- indexName
- Identifies the name of the text search index. The name of the
text search index with the index schema uniquely identifies the text
search index in the DB2 subsystem.
You must specify a non-null value for this parameter. This value must
be a valid SQL name.Note: Enclose names in double quotation marks if the names conflict with SQL keywords or OmniFind keywords that can be used.
The data type for this parameter is VARCHAR(128).
- textSource
- Identifies the table and column specification for the document
text source. This parameter can include user-defined functions. You
must specify a non-null value for this parameter.
The data type for this parameter is VARCHAR(1024).
- tableSchema
- Identifies the schema of the table on which the text search index
is created. Note: Enclose names in double quotation marks if the names conflict with SQL keywords or OmniFind keywords that can be used.
- tableName
- Identifies the name of the text table that contains the column
that the external text search index is created on. Notes:
- Views and logical files are not supported.
- An alias must point to a table or a single member of a physical file.
- Enclose names in double quotation marks if the names conflict with SQL keywords or OmniFind keywords.
- text-column-name
- Identifies the name of the column that contains
the text that is used for creating the text search index. This column
must be of type CHAR, CHAR FOR BIT DATA, BINARY, VARCHAR, VARCHAR
FOR BIT DATA, VARBINARY, CLOB, DBCLOB, BLOB, XML, GRAPHIC, or VARGRAPHIC.
If the data type is not one of these data types, you can specify an
external function that returns a supported data type.Notes:
- Only one text search index is allowed for a column. If a text search index exists for the column, SQLCODE-20427 is returned.
- Enclose names in double quotation marks if the names conflict with SQL keywords or OmniFind keywords.
- function-schema. function-name
- Identifies the schema and the name of a built-in
or user-defined function. The function can be used to modify a text
document stored in the column. The function can also be used to access
text documents in a column that is not of a supported data type. Or
the function can be used to access a document that is stored elsewhere.
The function has one input parameter for the text column data type.
For example, an integer that serves as a foreign key to the document
content in another table. The function returns a value of one of the OmniFind Text Search for DB2 for i supported
data types. The function transforms the text column content to the
indexed document content. Notes:
- Cast functions and functions with more than one argument are not allowed.
- Enclose names in double quotation marks if the names conflict with SQL keywords or OmniFind keywords that can be used.
- options
- A character string that specifies the various options that are
available for this stored procedure.
The data type for this parameter is VARCHAR(32000).
- text-default-information
- Specifies the coded character set identifier used when indexing
binary text documents. Also specifies the language that is used when
processing documents, and the format of text documents in the column.
- CCSID ccsid
- Specifies the coded character set identifier that is used for
a text search index in a column with a binary data type. The default
value is 1208 (UTF-8) and is taken from the QSYS2.SYSTEXTDEFAULTS
table. All the CCSIDs that are supported for conversion to UTF-8 by IBM® i conversion services are allowed
for this parameter.
This parameter is ignored for a text search index in a column with a non-binary data type. Text columns inherit the CCSID from the table specification. The ccsid value is ignored when the format value is set to INSO.
- LANGUAGE language
- Specifies the language that OmniFind Text Search Server for DB2 for i uses
for the linguistic processing of text documents. The default value
is en_US (English). If you specify the value for this parameter as
AUTO, OmniFind Text Search Server for DB2 for i tries
to determine the language. Important: If the language of the documents is not English, do not use the default value of en_US. Change the value to the language of the documents; otherwise, linguistic processing does not work as expected.
- FORMAT format
- Identifies the format of text documents in the column, such as
HTML. The OmniFind Text Search Server for DB2 for i needs
to know the format, or content type, of the text documents that you
intend to index and search. If you do not specify the format parameter,
the default value is taken from the FORMAT column in the QSYS2.SYSTEXTDEFAULTS
table. The supported format values are TEXT, HTML,
XML, and INSO.
The format value INSO allows OmniFind Text Search Server for DB2 for i to determine the format. In this case, the ccsid value is ignored. If the OmniFind Text Search Server for DB2 for i cannot determine the document format, then a document error is noted in the job log during processing by the SYSPROC.SYSTS_UPDATE stored procedure.
Note: If you do not specify the format parameter while creating the index over an XML data type column, the default value is XML. If you specify the format parameter as TEXT or INSO, the XML search capability is not available over this index. In addition, a warning message appears in the job log.
- update-characteristics
- Specifies the frequency of automatic updates to the text search
index. Also specifies the minimum number of changes to text documents
before the text search index is updated incrementally at the specified
time.
- UPDATE FREQUENCY update-frequency
- Specifies when to make automatic updates to the text search index.
The default value is NONE. This option might be useful for a text
column in which there are no further changes. The format of the update-frequency
option supports two different formats.
- NONE
- If NONE is specified, then no further index updates are made. The update must be started manually. This option might be useful for a text column in which no further changes are planned.
- D
- Specifies the day or days of the week when the index is updated. An asterisk (*) specifies all days. 0 specifies Sunday.
- H
- Specifies the hour or hours when the index is updated. An asterisk (*) specifies all hours.
- M
- Specifies the minute or minutes when the index is updated. An asterisk (*) cannot be specified. The minimum update frequency is 5 minutes.
Example: This example specifies that the index update is to run every 30 minutes.UPDATE FREQUENCY D(*) H(*) M(0,30)
The format of the update-frequency (chronological) option is a list of the five values separated by a blank space. The five values represent the minutes, hours, days of the month, months of the year, and days of the week beginning with Sunday.
If you specify an interval of values or an asterisk (*), you can specify a step value by using a forward slash (/) at the end of the defined interval.
Example: This example specifies that the index update is to run every quarter hour (
0,15,30,45
) on the even hours between 8 a.m. and 6:45 p.m. (8-18/2
is equivalent to8,10,12,14,16,18
), from Monday to Friday every month of the year (* * 1-5
).0,15,30,45 8-18/2 * * 1-5
- minute
- Specifies the minutes of the hour when the text search index is to be updated. You can specify an asterisk (*) for an interval of every 5 minutes, or you can specify an integer 0 - 59. You cannot repeat values. The minimum update frequency is 5 minutes. A value of 1,4,8 is not valid.
- hour
- Specifies the hours of the day when the text search index is to be updated. You can specify an asterisk (*) for every hour, or you can specify an integer 0 - 23. You cannot repeat values.
- dayOfMonth
- Specifies the days of the month when the text search index is to be updated. You can specify an asterisk (*) for every day, or you can specify an integer 1 - 31. You cannot repeat values.
- monthOfYear
- Specifies the months of the year when the text search index is to be updated. You can specify an asterisk (*) for every month, or you can specify an integer 1 - 12. You cannot repeat values.
- dayOfWeek
- Specifies the days of the week when the text search index is to be updated. You can specify an asterisk (*) for every day, or you can specify an integer 0 - 7. Both 0 and 7 are valid values for Sunday. You cannot repeat values.
- UPDATE MINIMUM minchanges
- Specifies the minimum number of record changes
made to the underlying table before the text search index is updated
incrementally at the time specified in the update-frequency option.
The value must be an integer 1 - 2147483647. The default value is
taken from the UPDATEMINIMUM column in the QSYS2.SYSTEXTDEFAULTS table.
This option is ignored when you update the text search index, unless you specify the USING UPDATE MINIMUM option in the SYSPROC.SYSTS_UPDATE stored procedure.
- index-configuration-options
- Specifies additional index-specific values as option value pairs.
You must enclose string values in single quotation marks. A single
quotation mark within a string value must be represented by two consecutive
single quotation marks.
- CJKSEGMENTATION
- Specifies the segmentation method to use when you index documents for CJK (Chinese, Japanese, Korean) languages. The supported values are MORPHOLOGICAL and NGRAM. If the CJKSEGMENTATION value is not specified, the default value is used. The default value is specified by the DEFAULTNAME value in the QSYS2.SYSTEXTDEFAULTS table.
- COMMENT
- Specifies a comment that is stored in the REMARKS column of the
QSYS2.SYSTEXTINDEXES administration table and as the description of
the OmniFind Text Search Server for DB2 for i collection.
The value for this option is a string value that is less than or equal to 512 bytes.
- IGNOREEMPTYDOCS
- Specifies whether to represent empty documents in the text search
index. Empty documents are those documents with an empty string or
a null value.
The supported values for this option are 0 (zero) and 1. The default value is 1.
If this option is set to 1, empty documents are not represented in the text search index. If you use this option and change the document content to empty, the next incremental update deletes the documents from the text search index.
- KEYCOLUMN
- Specifies the name of a unique column to be used as the key column in the text index. The key column is used to associate data in the text index to a document or row in the base table. The specified column must have a primary key constraint or unique index. If KEYCOLUMN is not specified, the ROWID column from the table is used, if one exists. Otherwise, the primary key defined on the table is used.
- SERVER
- Specifies the ID or alias name of the server to be used to store the text search index. If an ID is used, the value is an integer that must exist in the SERVERID column of the QSYS2.SYSTEXTSERVERS catalog. If an alias name is used, the value is a string that must exist in the ALIASNAME column of the QSYS2.SYSTEXTSERVERS catalog. If SERVER is not specified, the default is to select the server with the fewest text search indexes from the servers in the QSYS2.SYSTEXTSERVERS table where parameter SERVERSTATUS is set to 0 (zero), which means that the server is available.
- UPDATEAUTOCOMMIT
- Specifies how often a commit operation is performed when fetching
documents during an index update. A value of 0 (zero) means that a
commit operation occurs only at the end of processing.
The value must be an integer between 0 (zero) and 2147483647. The default value is 100.
Performance tip: The value of UPDATEAUTOCOMMIT can have a substantial impact on the performance of index updates. The commit operation that takes place at the specified interval ensures a consistent checkpoint from which to restart the index update, if it is interrupted. However, the commit also temporarily suspends the update process. Increasing the UPDATEAUTOCOMMIT value (or setting it to 0) can substantially improve the update performance, especially the initial update. The value you specify must balance the need for performance with the need for recoverability, based on the frequency of the index updates.
- ROW_COLUMN_ACCESS
- Specifies whether the text search index is considered secure for
row access control and column access control.
- NOT_SECURED
- Specifies that the text search index is considered not secure
for row access control and column access control. This is the default.
The based on table for the index must not have an active permission or Mask.
- SECURED
- Specifies that the index is considered secure for row access control
and column access control.
A text search index must be defined as secured to be built over a table with an active permission or column Mask. If a function is referenced to access or modify data in a masked column, the function must be defined as secured. The authorization ID must have DB_SECADMIN authority to create a text search index with the SECURED attribute.
When a text search index is built over sensitive information there are additional considerations:
The database administrator needs to be aware that the data specified as key column(s) for the text search index will be stored in a staging table in QSYS2 and sent to the text search server using network protocols.
The database administrator needs to be aware that data indexed by a text search index is sent to the text search server using network protocols and stored outside of DB2 on the text search server.
Default values for the options parameter
When you install OmniFind Text Search for DB2 for i, the QSYS2.SYSTEXTDEFAULTS table is created and populated with default values for the options parameter of the SYSPROC.SYSTS_CREATE stored procedure.
The following table lists the options, default values, and descriptions of the options.
Option | Default value | Description |
---|---|---|
CCSID | 1208 | Specifies the coded character set identifier that is used when binary text documents are indexed. |
CJKSEGMENTATION | NGRAM | Specifies the segmentation method to use when you index documents for CJK (Chinese, Japanese, Korean) languages. |
LANGUAGE | en_US | Specifies the language used to process text documents. |
FORMAT | TEXT | Identifies the format of text documents in the column. The default format is plain text unless the data type is XML. |
UPDATEFREQUENCY | NONE | Indicates that no automatic updates are scheduled. |
UPDATEMINIMUM | 1 | If at least one document changed since the last index update, the SYSPROC.SYSTS_UPDATE stored procedure starts processing. |
IGNOREEMPTYDOCS | 1 | Specifies that empty documents (documents with an empty string or a null value) are not represented in the text search index. The metadata fields for these documents are not available for search. |
UPDATEAUTOCOMMIT | 100 | Specifies how often a commit operation is performed when documents are fetched during an index update. |
MINIMUMUPDATEINTERVAL | 5 | Specifies the intervals for the UPDATEFREQUENCY option. Intervals cannot be shorter than 5 minutes. |
ROW_COLUMN_ACCESS | NOT_SECURED | Specifies whether the index is created as secured or not. |
USEREXITTHREADS | 0 | Reserved |