Determining the encoding scheme and CCSID of a string

An encoding scheme and a CCSID are attributes of strings, just as length is an attribute of strings. All values of the same string column have the same encoding scheme and CCSID.

Every string has an encoding scheme and a CCSID that identifies the manner in which the characters in the string are encoded. Strings can be encoded in ASCII, EBCDIC, or Unicode.

The CCSID that is associated with a string value depends on the SQL statement in which the data is referenced and the type of expression. Table 1 describes the rules for determining the CCSID that is associated with a string value. Use the Type 1 rules when the SQL statement meets the following conditions:

  • The SQL statement operates with a single set of CCSIDs (SBCS, mixed, and graphic). An SQL statement that does not contain any of the following items operates with a single set of CCSIDs:
    • References to columns from multiple tables or views that are defined with CCSIDs from more than one set of CCSIDs (SBCS, mixed, and graphic)
    • Graphic hexadecimal (GX) or hexadecimal Unicode (UX) string constants
    • References to the XMLCLOB built-in function
    • Cast specifications with a CCSID clause
    • User-defined table functions
  • The SQL statement is not one of the following statements:
    • CALL statement
    • SET assignment statement
    • SET special register
    • VALUES statement
    • VALUES INTO statement
  • Start of changeOne of the following built-in functions is not referenced:
    • XMLSERIALIZE
    • GETVARIABLE
    • DECRYPT_CHAR
    • DECRYPT_DB
    • DECRYPT_BIT
    • NORMALIZE_STRING
    • ASCII_CHR
    • CHAR
    • ASCII_STR (or ASCIISTR)
    • EBCDIC_CHR
    • EBCDIC_STR
    • UNICODE_STR (or UNISTR)
    End of change

Use the Type 2 rules when the statement does not meet the conditions for Type 1 rules.

Start of changeFor those SQL statements and tools that use a SYSDUMMYx table, use the SYSDUMMYx table that has the same encoding scheme as the other objects in your SQL statement, to avoid conversions. A SYSDUMMYx table is available in each encoding scheme. For example, suppose that your SQL statement references a SYSDUMMYx table that is in a different encoding scheme from other objects in the statement. DB2® treats this statement as one that references objects with different CCSIDs (Type 2 rules), and conversion is likely to occur. To avoid this situation, reference the SYSDUMMYx table that has the same encoding scheme as the other objects in your SQL statement. See SYSDUMMYx tables for more information.End of change

Table 1. Rules for determining the CCSID that is associated with string data
Source of the string data Type 1 rules Type 2 rules
String constant If the statement references a table or view, the encoding scheme of that table or view determines the encoding scheme for the string constant.

The CCSID is the appropriate character string CCSID for the encoding scheme.

The CCSID is the appropriate character string CCSID for the application encoding scheme.1
Start of changeDatetime constantEnd of change Start of changeIf the statement references a table or view, the encoding scheme of that table or view determines the encoding scheme for the string constant.

Otherwise, the default EBCDIC encoding scheme is used for the string constant.

The CCSID is the appropriate character string CCSID for the encoding scheme.

End of change
Start of changeThe CCSID is the appropriate character string CCSID for the application encoding scheme.1End of change
Hexadecimal string constant (X'...') If the statement references a table or view, the encoding scheme of that table or view determines the encoding scheme for the string constant.

Otherwise, the default EBCDIC encoding scheme is used for the string constant.

The CCSID is the appropriate graphic string CCSID for the encoding scheme.

The CCSID is the appropriate character string CCSID for the application encoding scheme.1
Graphic string
constant
(G'...')
If the statement references a table or view, the encoding scheme of that table or view determines the encoding scheme for the graphic string constant.

Otherwise, the default EBCDIC encoding scheme is used for the graphic string constant.

The CCSID is the graphic string CCSID for the encoding scheme.

The CCSID is the graphic string CCSID for the application encoding scheme.1
Graphic hexadecimal constant (GX'...') Not applicable. The CCSID is the graphic string CCSID for the application encoding scheme, which must be ASCII or EBCDIC.
Hexadecimal Unicode string constant (UX'....') Not applicable. The CCSID is 1200 (UTF-16).
Special register If the statement references a table or view, the encoding scheme of that table or view determines the encoding scheme for the special register.

Otherwise, the default EBCDIC encoding scheme is used for the special register.

The CCSID is the appropriate character string CCSID for the encoding scheme.

The CCSID is the appropriate CCSID for the application encoding scheme.1
Column of a table The CCSID is the CCSID that is associated with the column of the table. The CCSID is the CCSID that is associated with the column of the table.
Column of a view The CCSID is the CCSID of the column of the result table of the fullselect of the view definition. The CCSID is the CCSID of the column of the result table of the fullselect of the view definition.
Expression The CCSID is the CCSID of the result of the expression. The CCSID is the CCSID of the result of the expression.
Result of a built-in function If the description of the function, in Built-in functions, indicates what the CCSID of the result is, the CCSID is that CCSID.

Otherwise, if the description of the function refers to this table for the CCSID, the CCSID is the appropriate CCSID of the CCSID set that is used by the statement for the data type of the result.

If the description of the function, in Built-in functions, indicates what the CCSID of the result is, the CCSID is that CCSID.

Otherwise, if the description of the function refers to this table for the CCSID, the CCSID is the appropriate CCSID of the application encoding scheme for the data type of the result.1

Parameter of a user-defined routine The CCSID is the CCSID that was determined when the function or procedure was created. The CCSID is the CCSID that was determined when the function or procedure was created.
The expression in the RETURN statement of a CREATE statement for a user-defined SQL scalar function If the expression in the RETURN statement is string data, the encoding scheme is the same as for the parameters of the function. The CCSID is determined from the encoding scheme and the attributes of the data. The CCSID is determined from the CCSID of the result of the expression specified in the RETURN statement.
String host variable If the statement references a table or view, the encoding scheme of that table or view determines the encoding scheme for the host variable.

Start of changeGraphic variables are an exception if the table or view is EBCDIC or ASCII and the value of the MIXED DATA field on the DSNTIPF panel is NO. In this case, the Unicode encoding scheme is used for the graphic host variable.End of change

Otherwise, the default EBCDIC encoding scheme is used for the host variable.

The CCSID is the appropriate CCSID for the data type of the host variable.

At package prepare time, the CCSID is the appropriate CCSID for the data type of the host variable for the application encoding scheme.

Start of changeGraphic variables are an exception if the application or encoding scheme is EBCDIC or ASCII and the value of the MIXED DATA field on the DSNTIPF panel is NO. In this case, the Unicode encoding scheme is used for the graphic host variable.End of change

At run time, the CCSID specified in the declare variable statement, or as an override in the SQLDA. Otherwise, the CCSID is the appropriate CCSID for the data type of the host variable for the application encoding scheme.

Note: If the context is within a check constraint or trigger package, the CCSID is the appropriate CCSID for Unicode instead of the application encoding scheme.

The following examples show how these rules are applied.

Example 1: Assume that the default encoding scheme for the installation is EBCDIC and that the installation does not support mixed and graphic data. The following statement conforms to the rules for Type 1 in Table 1. Therefore, the X'40' is interpreted as EBCDIC SBCS data because the statement references a table that is in EBCDIC. The CCSID for X'40' is the default EBCDIC SBCS CCSID for the installation.
SELECT * FROM EBCDIC_TABLE WHERE COL1 = X'40';
the result of the query includes each row that has a value in column COL1 that is equal to a single EBCDIC blank.
Example 2: The following statement references data from two different tables that use different encoding schemes. This statement does not conform to the rules for Type 1 statements in Table 1. Therefore, the rules for Type 2 statements are used. The CCSID for X'40' is dependent on the current application encoding scheme. Assuming that the current application encoding scheme is EBCDIC, X'40' represents a single EBCDIC blank.
SELECT * FROM EBCDIC_TABLE, UNICODE_TABLE WHERE COL1 = X'40';
as with Example 1, the result of the query includes each row that has a value in column COL1 that is equal to a single EBCDIC blank. If the current application encoding scheme were ASCII or Unicode, X'40' would represent something different and the results of the query would be different.