Skip to main content

Character Data Representation Architecture

Chapter 2. Architecture strategy

This chapter describes the overall strategy used by Character Data Representation Architecture (CDRA) to address the data integrity concerns detailed in Chapter 1, and to provide a solution. Details of the solution are described in the following chapters. This solution can be used wherever graphic character data is handled.

Components of this strategy

The strategy used in CDRA:

The three components of CDRA strategy -- Architecture Base, Character Set Groups, and Levels -- are shown in Figure 4, and are detailed below.

Figure 4. CDRA strategy

Diagram.

CDRA strategy encompasses the four basic elements of CDRA, the character set groupings and levels of the architecture itself.

Architecture Base The first component of CDRA strategy is the architecture base. This component provides a framework to solve current problems, and can be extended to cover future requirements. It consists of:

Character Set Groups The second component of CDRA strategy is the concept of character set groups. Graphic character sets used in different countries to support different languages have been grouped into sets with common properties. A selected few of these are defined as Interoperable Character Sets within each group. To reduce the proliferation of graphic character sets and code pages in use, IBM and various standards organizations have collected and classified commonly used graphic characters into a few specific sets. Each of these sets has the following characteristics:

Special graphic character sets supporting specific applications (such as APL, scientific word processing, or desktop publishing) are treated as extensions to the base sets. Each graphic character set in all countries, with a few exceptions, contains a common set of graphic characters: the uppercase English letters A to Z, the lowercase English letters a to z, (4) the numerals 0 to 9, and 19 miscellaneous symbols. (5) See Figure 45 in Appendix A for a complete list. The implications of supporting character set groups differ in the types of services and resources needed for each group. Character set groups are shown in Figure 5, and are described in the following sections.

Figure 5. CDRA's Character Set Groupings

Diagram.

Commonly Used Character Sets

Large Character Sets
Over the past few years the IT industry has been very active in pursuing support of new, large repertoire character sets such as Unicode. CDRA has been enhanced to support these recent additions to the existing body of character set encodings. The new coded character sets are supersets of the many existing character sets of today. These large repertoire character sets are known as ISO/IEC 10646-1, Information Technology-Universal Multiple-Octet Coded Character Set (UCS) developed by the international standards bodies and Unicode, developed by an industry consortium. ISO/IEC 10646 specifies the universal coded character set. This character set is applicable to presentation, processing, storage, transmission, interchange and representation of all of the world's written form of language and symbols. This is an architected definition for coded character set representation endorsed by the international community. The architecture itself:

The major interest in UCS-2 centers around the BMP. This plane of 256 bytes by 256 bytes is divided into four zones, which are known as A, I, O, and R zones. In the BMP the A-zone is used for alphabetic and syllabic scripts as well as various symbols. This area contains what is commonly referred to as the Latin based scripts.

The I-zone contains Chinese, Japanese and Korean unified scripts.

The O-zone has been reserved to contain future characters as they are defined and standardized.

The R-zone has been deemed the restricted zone. Here is found private use characters (those which can be defined and used without the endorsement of any standards body), various presentation forms (as required for the Arabic scripts) and compatibility characters (used to bridge to some existing encoding standards).

ISO/IEC 10646 can be implemented in three different levels. Level 1 allows no combining characters (9), Level 2 allows for the use of some combining characters and Level 3 allows all defined characters to be used.

The Unicode standard defines a large character set and specifies a number of different encoding formats. For detailed information on how CDRA handles Unicode see Appendix K, CDRA and Unicode

CDRA Levels

The third component of CDRA strategy shown in Figure 4 is the concept of Levels. Levels are used to distinguish between specific sets of available elements from the architecture base, as the architecture and the supporting implementations evolve over time. The relationship between the levels has been depicted in the diagram shown in Figure 6. Level 1 provided the initial seed of CDRA, which was substantially extended with the release of Level 2. The growth in Level 2, noted as extensions in the diagram, has been more of a series of enhancements rather than the pronounced type of change that was seen from Level 1 to Level 2.

CDRA Level 1
CDRA Level 1 defined a initial set of elements from the architecture base. It consisted of:

Character Data Representation Architecture - Registry, SC09-1391 contained the following:

CDRA Level 1 addressed all of the commonly used character sets within:

CDRA Level 1 satisfied these objectives:

CDRA is primarily concerned with the coded graphic character set boundaries within and between different groups, rather than with political or geographical boundaries. However, these different types of boundaries are indirectly related to each other through the requirements for resources such as fonts, keyboards, and conversion tables.

CDRA Level 2
CDRA Level 2 included all Level 1 elements. In addition, it included definitions of functions called CDRA-Defined Services, along with the syntax for accessing these functions. These APIs were designed to be callable from any supported high-level language. A number of CDRA resources were needed to support the functions defined in Level 2. Level 2 included descriptions of the elements of those resources and some general principles for managing them. The resource data structures and the resource maintenance functions are implementation-specific.

Extensions
CDRA extensions now include support for:

Figure 6. Architecture levels

Extensions. Level 2. Level 1.

Contact IBM

live-assistance

Need assistance with your globalization questions?