Collating sequence

A collating sequence (also called a sort sequence) defines how characters in a character set relate to each other when they are compared and ordered.

Different collating sequences are useful for those who want their data ordered for a specific language. For example, lists can be ordered as they are normally seen for a specific language. A collating sequence can also be used to treat certain characters as equivalent, for instance, a and A. A collating sequence works on all comparisons that involve:

SBCS collating sequence support is implemented using a 256-byte table. Each byte in the table corresponds to a code point or character in a SBCS code page. Because the collating sequence is applicable to character data, a CCSID must be associated with the table. The bytes in the collating sequence table are set based on how each code point is to compare to other code points in that code page. For example, if the characters a and A are to be treated as equivalents for comparisons, the bytes in the collating sequence table for their code points contain the same value, or weight.

UCS-2 collating sequence support is implemented using a multi-byte table. A pair of bytes within the table corresponds to a character in the UCS-2 code page. Only a subset of the thousands of characters in UCS-2 are typically represented in the table. Only those characters that are to compare differently (and possibly other characters in the same ward) will be represented in the table. The bytes in the collating sequence table are set based on how each character is to compare with other characters in UCS-2.

When two or more bytes (or pair of bytes for UCS-2) in a collating sequence table have the same value, the collating sequence is a shared-weight collating sequence. If every byte (or pair of bytes for UCS-2) in a collating sequence table has a unique value, the collating sequence is a unique-weight collating sequence. For many languages, unique- and shared-weight collating sequences are shipped on the system as part of the operating system. If you need collating sequences for other languages or needs, you define them using the Create Table (CRTTBL) command.

Start of changeUTF-8 and UTF-16 collating sequence support is implemented using ICU (International Components for Unicode). This is a standard API to sort Unicode. The API produces the same result for normalized and non-normalized data and returns a sort weight based on language specific rules. The IBM® i operating system supports ICU 2.3.1, ICU 3.4, and ICU 4.0 collating sequences, but ICU 3.4 or ICU 4.0 should be used. The ICU collating sequence table I34en_us (United States locale) will sort data differently than I34fr_FR (French locale).End of change

If ICU is used, the LIKE predicate and the LOCATE, POSITION, POSSTR, and POSITION scalar functions are not supported.

If ICU 2.3.1 is used, the query cannot contain:
  • EXCEPT or INTERSECT operations,
  • VALUES in a fullselect,
  • OLAP specifications,
  • recursive common table expressions,
  • ORDER OF,
  • scalar fullselects (scalar subselects are supported),
  • full outer join,
  • LOBs in a GROUP BY,
  • grouping sets or super groups,
  • Start of changeORDER BY or FETCH FIRST n ROWS clause in a subselect,End of change
  • Start of changeCONTAINS or SCORE functions,End of change
  • Start of changeXMLAGG, XMLATTRIBUTES, XMLCOMMENT, XMLCONCAT, XMLDOCUMENT, XMLELEMENT, XMLFOREST, XMLGROUP, XMLNAMESPACES, XMLPI, XMLROW, or XMLTEXT functions, End of change
  • Start of changeglobal variables, orEnd of change
  • Start of changereferences to arrays.End of change

An ICU collating sequence table will generally produce results that are more culturally correct, however:

It is important to remember that the data itself is not altered by the collating sequence. Instead, a weighted representation of the data is used for the comparison. In SQL, a collating sequence is specified on the CRTSQLxxx, STRSQL, and RUNSQLSTM commands. The SET OPTION statement can be used to specify the collating sequence within the source of a program containing embedded SQL. The collating sequence applies to all character comparisons performed in the SQL statements. The default collating sequence on the system is the internal sequence that occurs when the hexadecimal representation of characters are used. This is the sequence you get when the SRTSEQ(*HEX) is specified. For programs precompiled with a release of the product that is earlier than Version 2 Release 3, the collating sequence is *HEX.

Collating sequences do not apply to FOR BIT DATA or binary string columns.

Start of changeThe collating sequence is not allowed for an index or unique constraint that contains a key column with a field procedure.End of change

The collating sequence is explicitly specified through the following interfaces:
Table 1. Collating Sequence Interfaces
SQL Interface Specification
Embedded SQL SRTSEQ parameter on the Create SQL Program (CRTSQLxxx) commands. The SET OPTION statement can also be used to set the SRTSEQ values.

(For more information about CRTSQLxxx commands, see Embedded SQL Programming.)
Run SQL Statements SRTSEQ parameter on the Run SQL Statements (RUNSQLSTM) command.

(For more information about the RUNSQLSTM command, see SQL Programming.)
Call Level Interface (CLI) on the server SQL_ATTR_JOB_SORT_SEQUENCE environment variable

(For more information about CLI, see SQL Call Level Interfaces (ODBC).)
JDBC or SQLJ on the server using IBM Developer Kit for Java™ job.sort.sequence property object

(For more information about JDBC and SQLJ, see IBM Developer Kit for Java.)
ODBC on a client using the IBM i Access Family ODBC Driver Sort Type in ODBC Setup

(For more information about ODBC, see System i® Access.)
JDBC on a client using the IBM Toolbox for Java Sort Sequence Table in JDBC Setup

(For more information about JDBC, see System i Access.)

(For more information about the IBM Toolbox for Java, see IBM Toolbox for Java.)
OLE DB on a client using the IBM i Access Family OLE DB Provider Sort Sequence Connection Object Properties

(For more information about OLE DB, see System i Access.)
ADO .NET on a client using the IBM i Access Family ADO .NET Provider SortSequence in Connection Object Properties

(For more information about ADO .NET, see System i Access.)

For more information about CCSIDs, see the Work with CCSIDs topic in the Globalization section of the IBM i Information Center. For more information about collating sequences and the sequences shipped with the system, see the DB2® and SQL collating sequence topic in the IBM i Information Center.