DB2 Version 9.7 for Linux, UNIX, and Windows

tokenize function

The fn:tokenize function breaks a string into a sequence of substrings.

Syntax


>>-fn:tokenize(--source-string--,--pattern--+----------+--)----><
                                            '-,--flags-'

source-string

A string that is to be broken into a sequence of substrings.

source-string is an xs:string value or the empty sequence.

pattern

The delimiter between substrings in source-string.

pattern is an xs:string value that contains a regular expression. A regular expression is a set of characters, wildcards, and operators that define a string or group of strings in a search pattern.

flags

An xs:string value that can contain any of the following values that control how pattern is matched to characters in source-string:

s: Indicates that the dot (.) in the regular expression matches any character, including the new-line character (X'0A').
If the s flag is not specified, the dot (.) matches any character except the new-line character (X'0A').
m: Indicates that the caret (‸) matches the start of a line (the position after a new-line character), and the dollar sign ($) matches the end of a line (the position before a new-line character).
If the m flag is not specified, the caret (‸) matches the start of a string, and the dollar sign ($) matches the end of the string.
i: Indicates that matching is case-insensitive.
If the i flag is not specified, case-sensitive matching is done.
x: Indicates that whitespace characters within pattern are ignored.
If the x flag is not specified, whitespace characters are used for matching.

Limitation of length

The length of source-string and pattern is limited to 32000 bytes.

Returned value

If source-string is not the empty sequence or a zero-length string, the returned value is a sequence that results when the following operations are performed on source-string:

source-string is searched for characters that match pattern.
If pattern contains two or more alternative sets of characters, the first set of characters in pattern that matches characters in source-string is considered to be the matching pattern.
Each set of characters that does not match pattern becomes an item in the result sequence.
If pattern matches characters at the beginning of source-string, the first item in the returned sequence is a string of length 0.
If two successive matches for pattern are found within source-string, a string of length 0 is added to the sequence.
If pattern matches characters at the end of source-string, the last item in the returned sequence is a string of length 0.

If pattern is not found in source-string, an error is returned.

If source-string is the empty sequence, or is the zero-length string, the result is the empty sequence.

Example

The following function creates a sequence from the string "Tokenize this sentence, please." "\s+" is a regular expression that denotes one or more whitespace characters.

fn:tokenize("Tokenize this sentence, please.", "\s+")

The returned value is the sequence ("Tokenize", "this", "sentence,", "please.").