Character Data Representation Architecture

Chapter 2. Architecture strategy

This chapter describes the overall strategy used by Character Data Representation Architecture (CDRA) to address the data integrity concerns detailed in Chapter 1, and to provide a solution. Details of the solution are described in the following chapters. This solution can be used wherever graphic character data is handled.


Components of this strategy

The strategy used in CDRA:

The three components of CDRA strategy -- Architecture Base, Character Set Groups, and Levels -- are shown in Figure 4, and are detailed below.


Figure 4. CDRA strategy


Diagram

CDRA strategy encompasses the four basic elements of CDRA, the character set groupings and levels of the architecture itself.

Architecture Base The first component of CDRA strategy is the architecture base. This component provides a framework to solve current problems, and can be extended to cover future requirements. It consists of:

Character Set Groups The second component of CDRA strategy is the concept of character set groups. Graphic character sets used in different countries to support different languages have been grouped into sets with common properties. A selected few of these are defined as Interoperable Character Sets within each group. To reduce the proliferation of graphic character sets and code pages in use, IBM and various standards organizations have collected and classified commonly used graphic characters into a few specific sets. Each of these sets has the following characteristics:

Special graphic character sets supporting specific applications (such as APL, scientific word processing, or desktop publishing) are treated as extensions to the base sets. Each graphic character set in all countries, with a few exceptions, contains a common set of graphic characters: the uppercase English letters A to Z, the lowercase English letters a to z, (4) the numerals 0 to 9, and 19 miscellaneous symbols. (5) See Figure 45 in Appendix A for a complete list. The implications of supporting character set groups differ in the types of services and resources needed for each group. Character set groups are shown in Figure 5, and are described in the following sections.


Figure 5. CDRA's Character Set Groupings


Diagram

Commonly Used Character Sets

Universal character set

The IT industry largely supports the Universal Coded Character Set, known as Unicode. CDRA supports Unicode as a defined character set encoding. Unicode is a superset of the many earlier country or language specific character sets. The character repertoire of Unicode, developed by the Unicode Consortium is kept in synch with the ISO standard, ISO/IEC 10646, Information Technology-Universal Coded Character Set (UCS). This character set is applicable to presentation, processing, storage, transmission, interchange and representation of all of the world's written forms of language and symbols.  UCS assigns a unique number to every character in all the living and archaic scripts and several symbols used in various application domains.  The architecture itself:

Earlier editions of ISO/IEC 10646 standard had defined ranges of code points called 'zones' in the BMP.  These have now been removed from the standard.  The standard also had definitions for 'levels of implementation' addressing ability to deal with 'combined character sequences' - these levels have also been removed from the standard.

The Unicode standard, in addition to being kept identical with ISO/IEC 10646 for the character set and assignments, defines properties and additional specifications on how to use these properties during text processing.

For detailed information on how CDRA handles Unicode see Appendix K, CDRA and Unicode


CDRA Levels

The third component of the CDRA strategy shown in Figure 4 is the concept of Levels. Levels are used to distinguish between specific sets of available elements from the architecture base, as the architecture and the supporting implementations evolve over time. The relationship between the levels has been depicted in the diagram shown in Figure 6. Level 1 provided the initial seed of CDRA, which was substantially extended with the release of Level 2. The growth in Level 2, noted as extensions in the diagram, has been more of a series of enhancements rather than the pronounced type of change that was seen from Level 1 to Level 2.


CDRA Level 1

CDRA Level 1 defined a initial set of elements from the architecture base. It consisted of:

Character Data Representation Architecture - Registry, SC09-1391 contained the following:

CDRA Level 1 addressed all of the commonly used character sets within:

CDRA Level 1 satisfied these objectives:

CDRA is primarily concerned with the coded graphic character set boundaries within and between different groups, rather than with political or geographical boundaries. However, these different types of boundaries are indirectly related to each other through the requirements for resources such as fonts, keyboards, and conversion tables.


CDRA Level 2

CDRA Level 2 included all Level 1 elements. In addition, it included definitions of functions called CDRA-Defined Services, along with the syntax for accessing these functions. These APIs were designed to be callable from any supported high-level language. A number of CDRA resources were needed to support the functions defined in Level 2. Level 2 included descriptions of the elements of those resources and some general principles for managing them. The resource data structures and the resource maintenance functions are implementation-specific.


Extensions

CDRA extensions now include support for:


Figure 6. Architecture levels


Extensions. Level 2. Level 1.

Contact IBM

Need assistance with your globalization questions?