SUBSTRING

The SUBSTRING function returns a substring of a string.

Character:

>>-SUBSTRING--(--character-expression--,--start--+-----------+--,--+-CODEUNITS16-+--)-><
                                                 '-,--length-'     +-CODEUNITS32-+      
                                                                   '-OCTETS------'      

Graphic:

>>-SUBSTRING--(--graphic-expression--,--start--+-----------+--,--+-CODEUNITS16-+--)-><
                                               '-,--length-'     '-CODEUNITS32-'      

Binary:

>>-SUBSTRING--(--binary-expression--,--start--+-----------+--)-><
                                              '-,--length-'      

The schema is SYSIBM.

Character

character-expression
An expression that specifies the string from which the result is derived. The string must be a built-in character string.

Start of changeThe argument can also be a numeric data type. The numeric argument is implicitly cast to a VARCHAR data type.End of change

The result of the function is a character string.

A substring of character-expression is zero or more contiguous units of character-expression. If CODEUNITS32 is specified, a unit is a Unicode UTF-32 character. If CODEUNITS16 is specified, a unit is a Unicode UTF-16 character. If OCTETS is specified, a unit is a byte.

start
An expression that specifies the position within the character-expression that is to be the first string unit of the result. start is expressed in the specified string unit, and must return an integer value.

Start of changeThe argument can also be another numeric value, a character string, or a graphic string data type. The input is implicitly cast to a numeric value of DECFLOAT(34) which is then assigned to an INTEGER value.End of change

The value of start can be positive, negative, or zero. A value of 1 indicates that the first string unit of the result is the first string unit of character-expression.
length
An expression that specifies the maximum length of the resulting substring.

If character-expression is a fixed-length string, omission of length is an implicit specification of CHARACTER_LENGTH(character-expression) - start + 1, which is the number of string units (CODEUNITS16, CODEUNITS32, or OCTETS) from start to the last position of character-expression.

If character-expression is a varying length string, omission of length is an implicit specification of zero or CHARACTER_LENGTH(character-expression) - start + 1, whichever is greater. If the resulting length is zero, the result is an empty string.

If specified, length must be an expression that returns a value of an integer data type.

Start of changeThe argument can also be another numeric value, a character string, or a graphic string data type. The input is implicitly cast to a numeric value of DECFLOAT(34) which is then assigned to an INTEGER value.End of change

The value must be greater than or equal to 0. If a value greater than n is specified, where n is the length attribute of character-expression - start+ 1, then n is used as the length of the resulting substring. The value is expressed in the units that are explicitly specified.

For more information, see Rigorous description of the actual length and result.

CODEUNITS16, CODEUNITS32, or OCTETS
Specifies the string unit that is used to express start and length. If character-expression is a character string that is defined as bit data, CODEUNITS16 and CODEUNITS32 cannot be specified.
CODEUNITS16
Specifies that start and length are expressed in terms of 16-bit UTF-16 code units.
CODEUNITS32
Specifies that start and length are expressed in terms of 32-bit UTF-32 code units.
OCTETS
Specifies that start and length are expressed in terms of bytes.

For more information about CODEUNITS16, CODEUNITS32, and OCTETS, see String unit specifications.

Graphic

graphic-expression
An expression that specifies the string from which the result is derived. The string must be a built-in graphic string. The result of the function is a graphic string. Start of changeA partial surrogate character in the expression is replaced with a blank.End of change

A substring of graphic-expression is zero or more contiguous units of graphic-expression. If CODEUNITS32 is specified, a unit is a Unicode UTF-32 character. If CODEUNITS16 is specified, a unit is a Unicode UTF-16 character.

start
An expression that specifies the position within the graphic-expression that is to be the first string unit of the result. start is expressed in the specified string unit, and must return an integer value. The value of start can be positive, negative, or zero. A value of 1 indicates that the first string unit of the result is the first string unit of graphic-expression.

Start of changeThe argument can also be another numeric value, a character string, or a graphic string data type. The string input is implicitly cast to a numeric value of DECFLOAT(34) which is then assigned to an INTEGER value.End of change

length
An expression that specifies the maximum length of the resulting substring.

If graphic-expression is a fixed-length string, omission of length is an implicit specification of CHARACTER_LENGTH(graphic-expression) - start +1, which is the number of units (CODEUNITS16, CODEUNITS32) either explicitly or implicitly specified, from the start position to the last position of graphic-expression. If graphic-expression is a varying length string, omission of length is an implicit specification of zero or CHARACTER_LENGTH(graphic-expression) -start +1, which is the number of units (CODEUNITS16, CODEUNITS32) either explicitly or implicitly specified, whichever is greater. If the resulting length is zero, the result is an empty string.

If specified, length must be an expression that returns a value of an integer data type.

Start of changeThe argument can also be another numeric value, a character string, or a graphic string data type. The string input is implicitly cast to a numeric value of DECFLOAT(34) which is then assigned to an INTEGER value.End of change

The value must be greater than or equal to 0. If a value greater than n is specified, where n is the length attribute of graphic-expression - start+ 1, then n is used as the length of the resulting substring. The value is expressed in the units that are explicitly specified.

For more information, see Rigorous description of the actual length and result.

CODEUNITS16 or CODEUNITS32
Specifies the string unit that is used to express start and length.
CODEUNITS16
Specifies that start and length are expressed in terms of 16-bit UTF-16 code units.
CODEUNITS32
Specifies that start and length are expressed in terms of 32-bit UTF-32 code units.

For more information about CODEUNITS16 and CODEUNITS32, see String unit specifications.

Binary

binary-expression
An expression that specifies the string from which the result is derived. The string must be a built-in binary string. The result of the function is a binary string.

A substring of binary-expression is zero or more contiguous units of binary-expression.

start
Start of changeAn expression that specifies the position within binary-expression to be the first byte of the result. It must be an integer value. start can be negative or zero. (The length attribute of a varying-length string is its maximum length.) A value of 1 indicates that the first byte of the substring is the byte unit of binary-expression.End of change
length
An expression that specifies the length of the resulting substring.

Start of changeIf binary-expression is a fixed-length string, omission of length is an implicit specification of LENGTH(binary-expression) - start +1 from the start position to the last position of binary-expression. If binary-expression is a varying length string, omission of length is an implicit specification of zero or CHARACTER_LENGTH(binary-expression) -start +1, whichever is greater. If the resulting length is zero, the result is an empty string.End of change

Start of change If specified, length must be a value of an integer data type. The value must be greater than or equal to 0 If a value greater than n is specified, where n is the length attribute of binary-expression - start+ 1, then n is used as the length of the resulting substring.End of change

For more information, see Rigorous description of the actual length and result.

Rigorous description of the actual length and result

In this description, the term character means the “unit specified by string units” or "bytes" if the input is binary.

Let C be the value of the first argument, let LC be the length in characters of C, and let S be the value of the start.

  • If length is specified, let L be the value of length and let E be S+L. Otherwise, let E be the larger of LC + 1 and S.
  • If either C, S, or L is the null value, the result of the function is the null value.
  • If E is less than S, an exception condition is raised: data exception — substring error.
  • Otherwise:
    • If S is greater than LC or if E is less than 1 (one), the result of the function is a zero-length string.
    • Otherwise:
      • Let S1 be the larger of S and 1 (one). Let E1 be the smaller of E and LC+1. Let L1 be E1S1.
      • The result of the function is a character string that contains the L1 characters of C starting at character number S1 in the same order that the characters appear in C.
The data type of the result depends on the data type of the first argument, as shown in the following table.
Table 1. Data type of the result of SUBSTRING
Data type of the first argument Data type of the result
CHAR or VARCHAR VARCHAR
CLOB CLOB

If character-expression is mixed data, the result is mixed data. Otherwise, the result is SBCS data.

GRAPHIC or VARGRAPHIC VARGRAPHIC
DBCLOB DBCLOB
BINARY or VARBINARY VARBINARY
BLOB BLOB

The length attribute of the result is equal to the length attribute of the first argument. If CODEUNITS16 or CODEUNITS32 is specified, see Determining the length attribute of the final result for information about how to calculate the length attribute of the result string.

The result can be null; if any argument is null, the result is the null value.

If the first argument is character or graphic data, the CCSID of the result is the same as that of the first argument.

Example 2: C1 is a VARCHAR(12) column in table T1. One of its values is the string 'ABCDEFG'. When C1 has the value 'ABCDEFG':
   Function:                        Returns:
   -------------------------------------------------------
   SUBSTRING(C1,-2,2,OCTETS)                 -- a zero-length string
   SUBSTRING(C1,-2,4,OCTETS)        'A'
   SUBSTRING(C1,-2,OCTETS)          'ABCDEFG'
   SUBSTRING(C1,0,1,OCTETS)                  -- a zero-length string
Example 1: FIRSTNAME is a VARCHAR(12) column in table T1. One of its values is the 6-character string 'Jürgen'. When FIRSTNAME has the value 'Jürgen':
   Function:                                   Returns:
   --------------------------------------------------------------
   SUBSTRING(FIRSTNAME,1,2,CODEUNITS32)       'Jü'  -- x'4AC3BC'
   SUBSTRING(FIRSTNAME,1,2,CODEUNITS16)       'Jü'  -- x'4AC3BC'
   SUBSTRING(FIRSTNAME,1,2,OCTETS)            'J '  -- x'4A20' (a truncated string)
   SUBSTRING(FIRSTNAME,8,CODEUNITS16)               -- a zero-length string
   SUBSTRING(FIRSTNAME,8,4,OCTETS)                  -- a zero-length string