z/OS Unicode Services User's Guide and Reference
Previous topic | Next topic | Contents | Contact z/OS | Library | PDF


Description of parameters in area CUN4BOPR

z/OS Unicode Services User's Guide and Reference
SA38-0680-00

CUN4BOPR_Version - set by caller

Specifies the version of the parameter area. This field must be initialized for the first call to stub routine CUN4LCOL using the constant CUN4BOPR_Ver which is supplied by the interface definition file CUN4BOID.

In order to exploit new Collation features (UCA versions UCA400R1, UCA410, UCA600 and tailoring features), CUN4BOPR_Version must be set with CUN4BOPR_Ver2 (Collation parameter area version 2). For backward compatibility purposes, the default value is CUN4BOPR_Ver.

CUN4BOPR_Length - set by caller
Specifies the length of the parameter area. HLASM users must initialize this field for the first call to CUN4LCOL using the constant CUN4BOPR_Len which is supplied by the interface definition file CUN4BOID.
CUN4BOPR_Src1_Buf_Ptr - set by caller, updated by service
Specifies the beginning address of the string of Unicode characters to be processed. No write operations are done in this field. The string has the length specified in the CUN4BOPR_Src1_Buf_Len parameter.
Note: Source buffer pointed by CUN4BOPR_Src1_Buf_Ptr must contain UTF-16 BE characters format only. Otherwise, Collation Service will cause unpredictable results.
CUN4BOPR_Src1_Buf_ALET - set by caller
Specifies the ALET to be used if the source 1 buffer addressed by CUN4BOPR_Src1_Buf_Ptr resides in a different data space. If not the primary address, the default value is 0.
CUN4BOPR_Src1_Buf_Len - set by caller
Specifies the length in bytes of the string in the source buffer, addressed by CUN4BOPR_Src1_Buf_Ptr, to be collated.
CUN4BOPR_Src2_Buf_Ptr - set by caller, updated by service
Specifies the beginning address of the string of Unicode characters to be processed. No write operations are done in this field. The string has the length specified in the CUN4BOPR_Src2_Buf_Len parameter.
Note: Source buffer pointed to by CUN4BOPR_Src2_Buf_Ptr must contain UTF-16 BE character format only. Otherwise, Collation Service will cause unpredictable results.
CUN4BOPR_Src2_Buf_ALET - set by caller
Specifies the ALET to be used if the source 2 buffer addressed by CUN4BOPR_Src2_Buf_Ptr resides in a different data space. If not the primary address, the default value is 0.
CUN4BOPR_Src2_Buf_Len - set by caller
Specifies the length in bytes of the string in the source buffer, addressed by CUN4BOPR_Src2_Buf_Ptr, to be collated.
CUN4BOPR_Targ1_Buf_Ptr - set by caller, updated by service
This variable has two primary functions:
  1. Binary comparison - If you need to do a comparison, you must specify two strings (to do a logical comparison). For this reason, CUN4BOPR_Targ1_Buf_Ptr needs to specify the beginning address and its related fields (CUN4BOPR_Targ1_Buf_ALET and CUN4BOPR_Targ1_Buf_Len).
  2. Sort key vector generation - If you need to generate a sort key vector, and you choose to set the CUN4BOPR_Src1_Buf_Ptr, you also need to set up its relative values (CUN4BOPR_Src1_Buf_ALET and CUN4BOPR_Src1_Buf_Len).

    In both cases, it is important that you to set up this field correctly. For more information, see Target buffer length considerations and Sort key vector format.

CUN4BOPR_Targ1_Buf_ALET - set by caller
Specifies the ALET to be used if the target 1 buffer addressed by CUN4BOPR_Targ1_Buf_Ptr resides in a different data space. If not the primary address, the default value is 0.
CUN4BOPR_Targ1_Buf_Len - set by caller, updated by service
Specifies the length in bytes of the target buffer addressed by CUN4BOPR_Targ1_Buf_Ptr. Certain conditions apply, dependent upon the collation level and the need for a sort key vector. See Target buffer length considerations for more information.
CUN4BOPR_Targ2_Buf_Ptr - set by caller, updated by service
This variable has two primary functions:
  1. Binary comparison - If you need to do a comparison, you must specify two strings (to do a logical comparison). For this reason, CUN4BOPR_Targ2_Buf_Ptr needs to specify the beginning address and its related fields (CUN4BOPR_Targ2_Buf_ALET and CUN4BOPR_Targ2_Buf_Len).
  2. Sort key vector generation - If you need to generate a sort key vector, and you choose to set the CUN4BOPR_Src2_Buf_Ptr, you also need to set up its relative values (CUN4BOPR_Src2_Buf_ALET and CUN4BOPR_Src2_Buf_Len).

    In both cases, it is important that you to set up this field correctly. For more information, see Target buffer length considerations and Sort key vector format.

CUN4BOPR_Targ2_Buf_ALET - set by caller
Specifies the ALET to be used if the target 2 buffer addressed by CUN4BOPR_Targ2_Buf_Ptr resides in a different data space. If not the primary address, the default value is 0.
CUN4BOPR_Targ2_Buf_Len - set by caller, updated by service
Specifies the length in bytes of the target buffer addressed by CUN4BOPR_Targ2_Buf_Ptr. Certain conditions apply, dependent upon the collation level and the need for a sort key vector. See Target buffer length considerations for more information.
CUN4BOPR_Coll_Handle - set by caller, updated by service
Specifies the handle to the collation tables. If the handle is present, it will be used, otherwise a new handle will be returned in CUN4BOPR_Coll_Handle. Subsequent calls to stub routine CUN4LCOL, requesting the same collation properties, will be faster because then the handle is used and CUN4BOPR_Coll_Type does not need to be recomputed.
Note: For the first call to stub routine CUN4LCOL, CUN4BOPR_Coll_Handle must be set to binary zero X'00'.
CUN4BOPR_Coll_Level - set by caller
Specifies the collation level as defined by the following constants (defined in the interface definition file CUN4BOID):
  • CUN4BOPR_PRIMARY
  • CUN4BOPR_SECONDARY
  • CUN4BOPR_TERTIARY
  • CUN4BOPR_QUATERNARY
  • CUN4BOPR_QUINARY (Supported by UCA400R1 and higher)
  • CUN4BOPR_IDENTICAL (Supported by UCA400R1 and higher)
Note:
  1. CUN4BOPR_QUINARY and CUN4BOPR_IDENTICAL have exactly the same behavior and were added to cover multiple naming conventions for those Collation Levels.
  2. Collation Levels are also named as "Collation Strength". See CUN4BOPR_Collation_Keyword field description.
CUN4BOPR_Wrk1_Buf_Ptr - set by caller, updated by service

Specifies the beginning address of the string addressed by CUN4BOPR_Wrk1_Buf_Ptr. This variable is mainly used for internal purposes; however, it must always be set. See Work buffer length considerations for more information.

CUN4BOPR_Wrk1_Buf_ALET - set by caller, updated by service
Specifies the ALET to be used if the work 1 buffer addressed by CUN4BOPR_Wrk1_Buf_Ptr resides in a different data space. If not the primary address, the default value is 0.
CUN4BOPR_Wrk1_Buf_Len - set by caller, updated by service
Specifies the length in bytes of the work 1 buffer addressed by CUN4BOPR_Wrk1_Buf_Ptr. The length addressed will depend on the collation rules, including the collation level. See Work buffer length considerations for more information.
CUN4BOPR_Wrk2_Buf_Ptr - set by caller, updated by service

Specifies the beginning address of the string addressed by CUN4BOPR_Wrk2_Buf_Ptr. This variable is mainly used for internal purposes; however, it must always be set. See Work buffer length considerations for more information.

CUN4BOPR_Wrk2_Buf_ALET - set by caller, updated by service
Specifies the ALET to be used if the work 2 buffer addressed by CUN4BOPR_Wrk2_Buf_Ptr resides in a different data space. If not the primary address, the default value is 0.
CUN4BOPR_Wrk2_Buf_Len - set by caller, updated by service
Specifies the length in bytes of the work 2 buffer addressed by CUN4BOPR_Wrk2_Buf_Ptr. The length addressed will depend on the collation rules, including the collation level. See Work buffer length considerations for more information.
CUN4BOPR_DDA_Buf_Ptr - set by caller
Specifies the beginning address of an area of storage that collation needs internally as a dynamic data area.
Note: CUN4BOPR_DDA_Buf_Ptr must be double-word boundary.
CUN4BOPR_DDA_Buf_ALET - set by caller
Specifies the ALET to be used if the dynamic data area addressed by CUN4BOPR_DDA_Buf_Ptr resides in a different address or data space. If not the primary address, the default value is 0.
CUN4BOPR_DDA_Buf_Len - set by caller
Specifies the length in bytes of the dynamic data area addressed by CUN4BOPR_DDA_Buf_Ptr. The required length is defined by constant CUN4BOPR_DDA_Req, which is provided in the interface definition file (CUN4BOID).
CUN4BOPR_Flag1 - set by caller
Bit position Name
1xxx xxxx
CUN4BOPR_Inv_Handle
x1xx xxxx
CUN4BOPR_Get_New_Handle
xx1x xxxx
CUN4BOPR_Page_Fix
CUN4BOPR_Inv_Handle
Specifies the action to be taken when the collation handle is invalid.
  • 0: Indicates that the collation is to be terminated with an error.
  • 1: Indicates that the collation is to be done with a new handle created by the collation service and put into CUN4BOPR_Coll_Handle.
CUN4BOPR_Get_New_Handle
Specifies the action to be taken with the new collation handle.
  • 0: Get and use the new handle and continue with the service.
  • 1: Get the new handle and return to the caller.
CUN4BOPR_Page_Fix
If the requested conversion is not currently loaded in memory, this flag indicates if it should be loaded in page-fixed memory.
  • 0: Indicates use of system storage management (default).
  • 1: Indicates use of page fixing.
Note: CUN4BOPR_Page_Fix applies to callers that run from Key 0 to Key 7 only. Callers with other keys (8-F) cannot exploit PAGE FIX storage in the Unicode Data Space.
CUN4BOPR_Mask - set by caller
This parameter is two bytes in length, and together with CUN4BOPR_Coll_Level defines the collation rules. The default value is MASK_DEFAULT.
The following table shows the format and description of the sub fields.
Table 1. Collation mask sub fields descriptions
Sub fields Description
CUN4BOPR_Variable_Opt This sub field specifies if operations with variable collation elements must be performed. The options are:
0 - Shifted (SHIFTED)
1 - Blanked (BLANKED)
2 - Non-Ignored (NIGNORED)
3 - Shift-Trimmed (STRIMMED)
4 - No Variable Behavior (NAVARIABLECE)
CUN4BOPR_Cmp_Order This sub field specifies following comparison orders:
0 - Forward (FORWARD) (Default)
1 - Backward (BACKWARD) (French behavior)
CUN4BOPR_SKey_Opt This sub field specifies either a comparison or sort key:
0 - No get sort key (SKOFF) and 
perform binary comparison.(Default)
1 - Get sort key (SKON) and do not 
perform binary comparison.
CUN4BOPR_Norm_Type This sub field specifies the normalization form according to the following values:
0 - No apply normalization (NNORM) (Default)
1 - Apply NFD (NFD)
2 - Apply NFC (NFC)
3 - Apply NFKD (NFKD)
4 - Apply NFKC (NFKC)
CUN4BOPR_GenSKey_and_Cmp Perform Binary comparison when Sort Key is also requested.
0 - Do not perform binary comparison (default) 
1 - perform binary comparison
Note: This bit flag will be meaningful if the following flags are set:
  • CUN4BOPR_Version = CUN4BOPR_Ver2
  • CUN4BOPR_SKey_Opt = SKON
  • CUN4BOPR_UCA_Ver = CUN4BOPR_UCA400R1 (or higher)
Collation version 3.0.1, was able to generate either:
  • Perform Binary comparisons or
  • Generate Sort Key
But not both.

From UCA400R1 and higher, its possible to generate sort key and perform binary comparison at the same time.

CUN4BOPR_RESULT - updated by service

Specifies the result of the binary comparison (between CUN4BOPR_Src1_Buf_Ptr and CUN4BOPR_Src2_Buf_Ptr).

The results can be evaluated according to the following values:
-1 if CUN4BOPR_Src1_Buf_Ptr < CUN4BOPR_Src2_Buf_Ptr 
 0 if CUN4BOPR_Src1_Buf_Ptr = CUN4BOPR_Src2_Buf_Ptr
 1 if CUN4BOPR_Src1_Buf_Ptr > CUN4BOPR_Src2_Buf_Ptr
CUN4BOPR_RC_RS - set by service
A structure that can be used to access CUN4BOPR_Return_Code and CUN4BOPR_Reason_Code as one unit.
CUN4BOPR_Return_Code - set by service
Specifies the return code.
CUN4BOPR_Reason_Code - set by service
Specifies the reason code.
CUN4BOPR_UCA_VER - set by caller
Specifies the Unicode Collation Algorithm version (UCA) which also makes reference to the specific Unicode Standard character suite.
Note: This field will be referenced if Collation Parameter Area is set as CUN4BOPR_Version = CUN4BOPR_Ver2, otherwise its content will be ignored.
CUN4BOPR_Case_Options - set by caller
Specifies CASE options.
CUN4BOPR_Case_First - set by caller
Specifies whether upper case characters collate before lower case characters or not:
  • 0 - Default (default value will depend on Locale. Most of the locales use Lower First as default.)
  • 1 - Upper First
  • 2 - Lower First
CUN4BOPR_Case_Options_Flags - set by caller
Setting CUN4BOPR_Case_Level to ON and CUN4BOPR_Coll_Level = CUN4BOPR_PRIMARY will ignore accent but not case:
  • 0 - Default
  • 1- Ignore accent but not under primary collation
Note: Those fields will be referenced if Collation Parameter Area is set as CUN4BOPR_Version = CUN4BOPR_Ver2 and CUN4BOPR_UCA_VER is set to CUN4BOPR_UCA400R1, CUN4BOPR_UCA410, or CUN4BOPR_UCA600, otherwise its content will be ignored.
CUN4BOPR_Special - set by caller
CUN4BOPR_Hiragana - set by caller
Specifies whether to distinguish between Japanese Hiragana and Katakana characters.
  • 0 - Do not distinguish (default)
  • 1 - Conform to the Japanese JIS X 4061 standard and use the CUN4BOPR_Coll_Level = CUN4BOPR_QUATERNARY collation.
Note: This field will be referenced if Collation Parameter Area is set as CUN4BOPR_Version = CUN4BOPR_Ver2 and CUN4BOPR_UCA_VER is set to CUN4BOPR_UCA400R1, CUN4BOPR_UCA410, or CUN4BOPR_UCA600, otherwise its content will be ignored.
CUN4BOPR_Var_Top - set by caller

Specifies the "highest" character (in UCA order) weight that is to be considered ignorable. The Variable Top attribute is only meaningful if the CUN4BOPR_Variable_Opt attribute is not set to Non-Ignored (NIGNORED). In such case, it controls which characters count as ignorable.

For example, if callers want white-space to be ignorable but not any visible characters, they would use the value CUN4BOPR_Var_Top=X'0020' (space). All characters of the same primary weight are equivalent, so CUN4BOPR_Var_Top=X'3000' (ideographic space) has the same effect as CUNBOPRM_Var_Top =X'0020'.

Note:
  1. All valid Code Points must be under UTF-16 format.
  2. Those fields will be referenced if Collation Parameter Area is set as CUN4BOPR_Version = CUN4BOPR_Ver2 and CUN4BOPR_UCA_VER is set to CUN4BOPR_UCA400R1, CUN4BOPR_UCA410, or CUN4BOPR_UCA600, otherwise its content will be ignored.
CUN4BOPR_Locale - set by caller
Specifies a locale, where specific Collation Rules will modify any of the default Unicode Collation tables specified (UCA400R1, UCA410, or UCA600. UCA301 does not support customization) and then Collation will behave according to those rules. Locales are set when you specify the following fields:
CUN4BOPR_Locale_Language - set by caller
Specify a language for desired locale.
CUN4BOPR_Locale_Region - set by caller
Specify a region for desired locale.
CUN4BOPR_Locale_Variant - set by caller
Specify a variant for desired locale.
Note:
  1. For supported Locales settings (Language/Region/Variant), see Locales for collation and case support.
  2. If there is no Locale information, UCA version will be set as default without any change.
  3. Those fields will be referenced if Collation Parameter Area is set as CUN4BOPR_Version = CUN4BOPR_Ver2 and CUN4BOPR_UCA_VER is set to CUN4BOPR_UCA400R1, CUN4BOPR_UCA410, or CUN4BOPR_UCA600, otherwise its content will be ignored.

Unicode Locales repository data set name SYS1.SCUNLOCL contains a set of locales documented in Locales for collation and case support. All of those locales contain a section for Collation rules.

Users might want to copy locales and modify them as needed and then provide the locale name in CUN4BOPR_Locale sub-fields. Then you have to provide CUN4BOPR_DSName and CUN4BOPR_Collation_Rules_Vol in case that you want to load the locales with the Unicode dynamic capabilities. If that locale (modified by the users) is already loaded in the Unicode environment, there is no need to set data set and volume information.

The following example (CUNENUSX) shows how a locale looks like:
******************************************************************
* Licensed Materials - Property of IBM                           *
*                                                                *
* "Restricted Materials of IBM"                                  *
*                                                                *
* (C) Copyright IBM Corp.  2006                                  *
*                                                                *
* Status = HUN7730                                               *
*                                                                *
******************************************************************
                                                                  
<version $revision: 1.19 $ = default>                             
  <collation>                                                     
    <rules>                                                       
      &\u0061\u0065                                               
         <<\u00E6                                                 
         <<<\u00C6                                                
    </rules>                                                      
  </collation>                                                    
</version $revision: 1.19 $>                                      
                                                                                                

For further information about Locales, see Locales for collation and case support.

For further information about Collation rules syntax, see CUN4BOPR_Collation_Rules_File field description.

From Locales for collation and case support the value shown in Column 2 for the Collation API field CUN4BOPR_Collation_Keyword is used for "short path". Based on that field values for locales purpose, the following table shows some examples about how to get equivalencies between "short path" and "long path" settings.
Table 2. Equivalencies between short path and long path local settings
CUN4BOPR_Collation_Keyword CUN4BOPR_Locale_Language CUN4BOPR_Locale_Region CUN4BOPR_Locale_Variant
LAF AF    
LAR_RBH AR BH  
LDE_RAT_VPREEURO DE AT PREEURO
LZH_VPINYIN ZH   PINYIN
LEN_RUS_VPOSIX EN US POSIX
Locales information for CUN4BOPR_Collation_Keyword has the following prefixes:
  • Lxx - For Language
  • Ryy - For Region
  • Vzz - For Variant

For CUN4BOPR_Locale_Language, CUN4BOPR_Locale_Region and CUN4BOPR_Locale_Variant, you can use exactly the same values but without the prefixes L, R or V.

Note: IBM® does not recommend to use CUN4BOPR_Locale directly, instead of that, use sub-fields CUN4BOPR_Locale_Language, CUN4BOPR_Locale_Region or CUN4BOPR_Locale_Variant.
CUN4BOPR_Collation_Keyword - set by caller
Specifies the "short path" settings form compatible with International Components for Unicode (ICU). IBM suggests you use this field instead of the "long path" settings for Collation callers for UCA400R1, UCA410, and UCA600 versions in the Collation API. This field can be set according the following table:
Table 3. Collation keywords descriptions
Attribute Name Key Possible Values Description
Locale L R V <locale>

Provide a specific locale for collation rules which are in SYS1.SCUNLOCL repository. For Locales supported, see Locales for collation and case support.

Where "Attribute Name" has the following format:

Lxx_Ryy_Vzz, where:
  • L means language
  • R means region
  • V means variant
Example:
UCA400R1_LSV (Swedish) "Kypper" < "Köpfe"

For long path equivalent setting, see CUNBOPRM_Locale description.

Strength S 1, 2, 3, 4, I, D

The Strength attribute determines whether accents or case are taken into account when collating or matching text (In UCA this is named Collation Levels. See CUNBOPRM_Coll_Level description).

Example:
UCA400R1_S1 role = Role = rôle
UCA400R1_S2 role = Role < rôle
UCA400R1_S3 role < Role < rôle

For long path equivalent setting, see CUNBOPRM_Coll_Level description.

Case_Level K X, O, D

The Case Level attribute is used when ignoring accents but not case. In such case, set Strength to Primary, and Case_Level to On.

In most locales, this setting is Off by default.

Example:
UCA400R1_S1_KX   role = Role = rôle 
UCA400R1_S1_KO   role = rôle < Role

For long path equivalent setting, see CUNBOPRM_Case_Level description.

Case_First C X, L, U, D

The Case First attribute is used to control whether uppercase letters come before lowercase letters or vice versa in the absence of other differences in the strings. The possible values are Upper Case First (U) and Lower Case First (L), plus the standard Default and Off. There is almost no difference between the Off and Lower Case First options in terms of results, so typically users will not use Lower Case First but only Off or Upper Case First.

Example:
UCA400R1_CX or UCA400R1_CL "china" < "China" < "denmark" < "Denmark"
UCA400R1_CU                "China" < "china" < "Denmark" < "denmark"

For long path equivalent setting, see CUNBOPRM_Case_First description.

Alternate A N, S, D

The Alternate attribute is used to control the handling of the so-called variable characters in the UCA: white-space, punctuation and symbols. If Alternate is set to Non-Ignorable (N), then differences among these characters are of the same importance as differences among letters.

If Alternate is set to Shifted (S), then these characters are of only minor importance. The Shifted value is often used in combination with Strength set to Quaternary. In such case, white-space, punctuation, and symbols are considered when comparing strings, but only if all other aspects of the strings (base letters, accents, and case) are identical.

If Alternate is not set to Shifted, then there is no difference between a Strength of 3 and a Strength of 4.

For more information and examples, see Variable_Weighting in the UCA. The reason the Alternate values are not simply On and Off is that additional Alternate values may be added in the future. The UCA option Blanked is expressed with Strength set to 3, and Alternate set to Shifted.

Example:
UCA400R1_S3_AN di Silva < Di Silva < diSilva < U.S.A. < USA
UCA400R1_S3_AS di Silva = diSilva < Di Silva < U.S.A. = USA
UCA400R1_S4_AS di Silva < diSilva < Di Silva < U.S.A. < USA

For long path equivalent setting, see CUNBOPRM_Variable_Opt description.

Variable_Top T <hex digits>

The Variable Top attribute is only meaningful if the Alternate attribute is not set to Non-Ignorable. In such a case, it controls which characters count as ignorable. The string value specifies the "highest" character (in UCA order) weight that is to be considered ignorable.

Thus, for example, if a user wanted white-space to be ignorable, but not any visible characters, then s/he would use the value Variable Top="\u0020" (space). All characters of the same primary weight are equivalent, so Variable Top="\u3000" (ideographic space) has the same effect as Variable_Top="\u0020".

Example:
UCA400R1_S3_AN di       Silva < diSilva < U.S.A. < USA
UCA400R1_S3_AS di       Silva = diSilva < U.S.A. = USA
UCA400R1_S3_AS_T0020 di Silva = diSilva < U.S.A. = USA

For long path equivalent setting, see CUNBOPRM_Var_Top description.

Normalization Checking N X, O, D

The Normalization setting determines whether text is thoroughly normalized or not in comparison (see also CUN4BOPR_Norm_Type).

Example:
UCA400R1_NX ä= a + Ì% < ä+ Ì% < ¡+ Ì%
UCA400R1_NO ä= a + Ì% < ä+ Ì% < ¡+ Ì%

For long path equivalent setting, see CUNBOPRM_Norm_Type description.

French F X, O, D

The French sort strings with different accents from the back of the string. This attribute is automatically set to On for the French locales and a few others. Users normally would not need to explicitly set this attribute. There is a string comparison performance cost when it is set On, but sort key length is not affected (see also CUN4BOPR_Cmp_Order).

Example:
UCA400R1_FX cote < coté< côte < côté
UCA400R1_FO cote < côte< coté < côté

For long path equivalent setting, see CUNBOPRM_Cmp_Order description.

Hiragana H X, O, D

Compatibility with JIS x 4061 requires the introduction of an additional level to distinguish Hiragana and Katakana characters. If compatibility with that standard is required, then this attribute should be set On, and the strength set to Quaternary. This will affect sort key length and string comparison string comparison performance.

Example:
UCA400R1_HX_S4 M0...= -å< M0†= -0æ
UCA400R1_HO_S4 M0...< -å< M0†< -0æ

For long path equivalent setting, see CUNBOPRM_Hiragana description.

Valid values for collation keywords are listed in the following table:
Table 4. Valid values for collation keywords
Value Abbreviation
Default D
On O
Off X
Primary 1
Secondary 2
Tertiary 3
Quaternary 4
Identical I
Shifted S
Non-Ignorable N
Lower-First L
Upper-First U

These abbreviations allow a 'short path settings' specification of a set of collation options, such as "UCA400R1_AS_LSV_S2", which can be used to specify that the desired options are: UCA version 4.0.1; ignore spaces, punctuation and symbols; use Swedish linguistic conventions; compare case-insensitively.

A number of attribute values are common across different attributes; these include Default (abbreviated as D), On (O), and Off (X).

This form is compatible with ICU 3.2, however, the content of this short-set form fields is mutually exclusively from current collation configuration fields (long path settings), which means that this field will be the first one to be analyzed prior current collation fields content sets.

Note:
All collation keywords sets must start with one of the following Collation versions followed by desired sets:
  • * UCA400R1_...
  • * UCA410_...
  • * UCA600_...

If there is an invalid Keyword or invalid keyword value, Collation will return RC8/RS24 (CUN_RC_USER_ERR/ CUN_RS_INVALID_COLLATION_KEYWORD_VALUES). If some of the keywords appear more than once, RC8/RS31 will be returned (CUN_RC_USER_ERR/ CUN_RS_OVERLAYING_COLLATION_KEYWORD).

CUN4BOPR_DSName - set by caller
Specifies the name of the alternative data set from where the rules are to be loaded. It enables callers to load Locales from non-official Unicode repository (SYS1.SCUNLOCL) or load User Collation Rules Files from private data spaces as well (see CUN4BOPR_Collation_Rules_File).
CUN4BOPR_Collation_Rules_File - set by caller
Specifies member name where the alternative collation rules are. You can use User Collation Rules (UCR) for full Collation customization environment. Those files can be considered as a variation of Collation Rules or Locales since both UCR and Locales follow exactly the same collation syntax.
Collation rules can be redefined using the following symbols:
Table 5. Collation rule symbols
Symbol Example Description
< \u0061<\u0062 Identifies a primary (base letter) difference between "a" and "b"
<< \u0061<<\u00E4 Signifies a secondary (accent) difference between "a" and "ä"
<<< \u0061<<<\u0041 Identifies a tertiary difference between "a" and "A"
= x = y Signifies no difference between "x" and "y".
Note: X means CP x and Y means CP Y (x,y are not chars but CPs)
& &Z These rules will be relative to this letter, but will not affect the position of Z itself.
Note: Z means CP Z (Z is not char but a CP)
/ æ/e Expansion. Add the collation element for 'e' to the collation element for æ. After a reset "&ae << æ" is equivalent to "&a << æ/e".
| a|b Prefix processing. If 'b' is encountered and it follows 'a', output the appropriate collation element. If 'b' follows any other letter, output the normal collation element for 'b'. Collation element for 'a' is not affected.
Also the following tags might be part of the Collation syntax rules (default values are in BOLD and italic) as an easier way to set collation behavior:
Table 6. Collation syntax rules
Option Example Description
... ... See CUNBOPRM_Locale parameter description field. Describes the start/end block of sets for a locale. X.x and default denotes a locale revision/version, however, Locales versions are not meaningful at this time.
... ... Refer to your default Unicode locales repository SYS1.SCUNLOCL and look for CUNAF locale. Describes the start/end block of sets for a locale, where no revision and version are required, because default UCA rules are part of this locale.
... ... See the example that follows table "Collation syntax rules". Describes the start/end block of sets for a User Collation Rules (UCR). Default denotes an "UCR" version which is not meaningful at this time.
Alternate

[alternate non-ignorable]
[alternate shifted]

Sets the default value for Alternate attribute. If set to shifted, variable code points will be ignored on the primary level.
Backwards [backwards 2] Sets the default value for Backwards attribute. If set to on, secondary level will be reversed.
Variable top & X < [variable top] Sets the default value for Variable Top attribute. All the code points with primary strengths less than variable top will be considered variable.
Normalization Case Level

[normalization off]
[normalization on]

Turns on or off the Normalization attribute. If set to on, a quick check and necessary normalization will be performed.
Case Level

[caseLevel off]
[caseLevel on]

Turns on or off the Case Level attribute. If set to on a level consisting only of case characteristics will be inserted in front of tertiary level. To ignore accents but take cases into account, set strength to primary and case level to on.
Case First

[caseFirst off]
[caseFirst upper]
[caseFirst lower]

Sets the value for Case First attribute. If set to upper, causes upper case to sort before lower case. If set to lower, lower case will sort before upper case. Useful for locales that have already supported ordering but require different order of cases. Affects case and tertiary levels.
Strength

[strength 1]
[strength 2]
[strength 3]
[strength 4]
[strength 5]
[strength I]

Sets the default strength attribute.
Hiragana

[hiraganaQ off]
[hiraganaQ on]

Controls special treatment of Hiragana code points on quaternary level. If turned on, Hiragana code points will get lower values than all the other non-variable code points. Strength must be greater or equal than quaternary if you want this attribute to take effect. Set UCOE_HIRAGANAQ.
[before 1|2|3] &[before 1] a<?<à<?<á? Enables users to order characters before a given character. In UCA 3.0, the example is equivalent to &?<?<à<?<á? (?= \u3029, Hangzhou numeral nine) * and makes accented 'a' letters sort before 'a'. Accents are often used to indicate the intonations in Pinyin. In this case, the non-accented letters sort after the accented letters.
[last non ignorable] &[last non ignorable]<\u4E9C Defines a list of CP's which will be positioned right after [last non-ignorable] CP.
[last regular] &[last regular]<\u4E9C Equivalent as [last non-ignorable]
[suppressContractions [FromCP-ToCP]] &[suppressContractions [\u0400-\u045F]] Suppress all contraction defined in a range defined by FromCP - ToCP. After this rule, all of them will be treated as Normal CP's.
[last secondary ignorable] &[last secondary ignorable]<<<\u0020 All CP's after [last secondary ignorable] will be placed after last secondary ignorable CP.
The following is an example which can be used as UCR files:
******************************************************************
*  Owner: My Name                                                *
*  Prof Description: User Collation Rules profile sample         *
*                                                                *
*                                                                *
*                                                                *
*                                                                *
*                                                                *
*                                                                *
*                                                                *
*                                                                *
******************************************************************
<version $UCR$ = default>                                         
 <collation>                                                      
   <rules>                                                        
      [strength 1]                 * Collation Settings ...       
      [alternate non-ignorable]                                   
      [backwards 2]                                               
      [normalization on]                                          
      [caseLevel on]                                              
      [caseFirst off]                                             
      [hiraganaQ off]                                             
      &\u0061\u0065               * Modifying CPs                 
            <<\u00E6                                              
            <<<\u00C6                                             
      &\u0062<\u0061                                              
   </rules>                                                       
  </collation>                                                    
 </version $UCR$ = default>                                       
     
For Collation Rules Files or locales files consider the following:
  • Use the asterisk "*" as a comment line, starting at column 1.
  • Whatever collation settings must be specified inside of the tags <rules> ... </rules>.
  • All collation tags and values are key sensitive. Use exact same tags and UTF-16 CP format as specified in this topic.
  • As part of code points, use the following UTF-16, that is, \u0061. "\u" denotes a UTF-16 CP.
  • Blanks are not allowed after each one of the following symbols:
    • =\u
    • <\u
    • <<\u
    • <<<\u
    • /\u
For this new collation implementation (tailoring for UCA400R1 and higher - not available for UCA301), there are two ways to perform collation settings in the Collation API. You must follow the following order in case that more than one is specified in the Collation API.
  1. Short path - This setting is based on the contents of CUN4BOPR_Collation_Keyword For example, "UCA400R1_LEN_RUS_VPOSIX"
  2. Long path - This setting is used when some of the following fields are set and values are followed according to its order in the following list:
    • CUN4BOPR_Coll_Level
    • CUN4BOPR_Variable_Opt
    • CUN4BOPR_Cmp_Order
    • CUN4BOPR_SKey_Opt
    • CUN4BOPR_Norm_Type
    • CUN4BOPR_Case_First
    • CUN4BOPR_Case_Level
    • CUN4BOPR_Hiragana
    • CUN4BOPR_Var_Top
    • CUN4BOPR_Locale_Language, CUN4BOPR_Locale_Region or CUN4BOPR_Locale_Variant
    • CUN4BOPR_Collation_Rules_File
Note: For long path settings, collation API fields like CUN4BOPR_Coll_Level , CUN4BOPR_Variable ... CUN4BOPR_Var_Top overide any Collation settings on Locales (CUN4BOPR_Locale) or UCR (CUN4BOPR_Collation_Rules_File).
CUN4BOPR_Collation_Rules_Vol - set by service
Specify the volume for data set specified by CUN4BOPR_DSName.

Go to the previous page Go to the next page




Copyright IBM Corporation 1990, 2014