DB2 Version 9.7 for Linux, UNIX, and Windows

tokenize function

The fn:tokenize function breaks a string into a sequence of substrings.

Syntax

Read syntax diagramSkip visual syntax diagram
>>-fn:tokenize(--source-string--,--pattern--+----------+--)----><
                                            '-,--flags-'      

source-string
A string that is to be broken into a sequence of substrings.

source-string is an xs:string value or the empty sequence.

pattern
The delimiter between substrings in source-string.

pattern is an xs:string value that contains a regular expression. A regular expression is a set of characters, wildcards, and operators that define a string or group of strings in a search pattern.

flags
An xs:string value that can contain any of the following values that control how pattern is matched to characters in source-string:
s
Indicates that the dot (.) in the regular expression matches any character, including the new-line character (X'0A').

If the s flag is not specified, the dot (.) matches any character except the new-line character (X'0A').

m
Indicates that the caret (‸) matches the start of a line (the position after a new-line character), and the dollar sign ($) matches the end of a line (the position before a new-line character).

If the m flag is not specified, the caret (‸) matches the start of a string, and the dollar sign ($) matches the end of the string.

i
Indicates that matching is case-insensitive.

If the i flag is not specified, case-sensitive matching is done.

x
Indicates that whitespace characters within pattern are ignored.

If the x flag is not specified, whitespace characters are used for matching.

Limitation of length

The length of source-string and pattern is limited to 32000 bytes.

Returned value

If source-string is not the empty sequence or a zero-length string, the returned value is a sequence that results when the following operations are performed on source-string:
  • source-string is searched for characters that match pattern.
  • If pattern contains two or more alternative sets of characters, the first set of characters in pattern that matches characters in source-string is considered to be the matching pattern.
  • Each set of characters that does not match pattern becomes an item in the result sequence.
  • If pattern matches characters at the beginning of source-string, the first item in the returned sequence is a string of length 0.
  • If two successive matches for pattern are found within source-string, a string of length 0 is added to the sequence.
  • If pattern matches characters at the end of source-string, the last item in the returned sequence is a string of length 0.

If pattern is not found in source-string, an error is returned.

If source-string is the empty sequence, or is the zero-length string, the result is the empty sequence.

Example

The following function creates a sequence from the string "Tokenize this sentence, please." "\s+" is a regular expression that denotes one or more whitespace characters.
fn:tokenize("Tokenize this sentence, please.", "\s+")

The returned value is the sequence ("Tokenize", "this", "sentence,", "please.").