Character Data Representation Architecture

Glossary

This glossary includes definitions of terms and acronyms found in this document. For a more complete list of terms please refer to the IBM Terminology web site.

 

A

 

ACRI
See additional coding-related required information.

 

ACRI-PCMB
See additional coding-related required information - PC mixed byte.

 

additional coding-related required information (ACRI)
The information, in addition to encoding scheme identifier, code page, and character set global identifiers, that is required to complete the definition associated with using particular encoding schemes. An example is the ranges of valid first bytes of double-byte code points in a PC Mixed single-byte and double-byte code.

 

additional coding-related required information - PC mixed byte (ACRI-PCMB) A CDRA identifier that defines the ranges of valid first bytes of double byte code points in a PC Mixed SB/DB encoding scheme.

 

American Standard Code for Information Interchange (ASCII)
A standard code used for information exchange among data processing systems, data communication systems, and associated equipment. ASCII uses a coded character set consisting of 7-bit coded characters.

 

API
See application programming interface.

 

APL
See A programming language.

 

application programming interface (API)
An interface that allows an application program that is written in a high-level language to use specific data or functions of the operating system or another program.

 

A programming language (APL)
A programming language based on mathematical notation that is used to develop application programs. A is particularly useful for commercial data processing, system design, mathematical and scientific computation, database applications, and teaching mathematics.

 

Arabic numeral
One of the 10 numerals used in decimal notation: the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9. See also Hindi numeral.

 

ASCII
See American Standard Code for Information Interchange.

 

C

 

CCS
See coded character set.

 

CCSID
See coded character set identifier.

 

CCSID resource
A representation of the various elements associated with a CCSID in a system in a machine readable form.

 

CCSID resource repository
An organized collection of CCSID resources that are maintained by a service provider in a system.

 

CDRA
See Character Data Representation Architecture.

 

CECP
See country extended code page.

 

CGCSGID
See coded graphic character set global identifier.

 

Character Data Representation Architecture (CDRA)
An IBM architecture that defines a set of identifiers, resources, services, and conventions to achieve consistent representation, processing, and interchange of graphic character data in heterogeneous environments.

 

coded character set (CCS)
A set of unambiguous rules that establishes a character set and the one-to-one relationships between the characters of the set and their coded representations. See also invariant character set.

 

coded character set identifier (CCSID)
A 16-bit number that includes a specific set of encoding scheme identifiers, character set identifiers, code page identifiers, and other information that uniquely identifies the coded graphic-character representation.

 

coded graphic character set
A set of graphic characters with their assigned code points.

 

coded graphic character set global identifier (CGCSGID)
A 4-byte binary or a 10-digit decimal identifier consisting of the concatenation of a GCSGID and a CPGID. The CGCSGID identifies the code point assignments in the code page for a specific graphic character set, from among all the graphic characters that are assigned in the code page.

 

code extension method
A method prescribed in an encoding scheme for representing characters that cannot be accommodated within the limits of the basic structure of the code. It prescribes a method to alter the interpretation of one or more code points that follow a prescribed single control character or a control sequence.

 

code page
A specification of code points from a defined encoding structure for each graphic character in a set or in a collection of graphic character sets. Within a code page, a code point can have only one specific meaning. See also invariant character set.

 

code page global identifier (CPGID) A 5-digit decimal or 2-byte binary identifier that is assigned to a code page. The range of values is 00001 to 65534 (X'0001' to X'FFFE').

 

code point
A unique bit pattern defined in a code. Depending on the code, a code point can be 7-bits, 8-bits, 16-bits, or other. Code points are assigned graphic characters in a code page.

 

component
A hardware or software entity forming part of a system, or a piece of logic that controls the operation of a device, modifies, or stops a control function.

 

control function
An element of a character set that affects the recording, processing, transmission, or interpretation of data, and that has a coded representation of one or more bit combinations (see ISO/IEC 6429).

 

conversion
The process of replacing a code point that is assigned to a character in one code with its corresponding code point assigned in another code.

 

conversion method
An algorithm used during conversion. It includes the necessary logic to separate the input code point string into appropriate substrings, converting the substrings and assembling the resultant substrings, for a particular set of criteria to be used during conversion. A conversion method may use associated conversion tables as resources during the conversion.

 

conversion table
A resource used with a conversion method to perform conversion. Typically, a conversion table contains a set of input code point values corresponding to a given set of output code point values. Its structure and contents are designed to suit the conversion algorithm with which it is to be used.

 

country extended code page (CECP)
A single-byte EBCDIC code page in the IBM corporate registry that contains the 190 characters found in character set 00697. While each CECP contains the same set of characters (allowing for conversion of data without loss), the code point allocation of the characters is not identical. For example, all CECPs contain the character backwards slash, however in code page 500 it is located at code point x'E0' and in code page 280 it is located at code point x'48'.

 

CPGID
See code page global identifier.

 

D

 

database (DB)
A collection of interrelated or independent data items that are stored together to serve one or more applications.

 

data stream
The commands, control codes, data, or structured fields that are transmitted between an application program and a device such as printer or nonprogrammable display station.

 

DB
See database.

 

DBCS
See double-byte character set.

 

DCF
See Document Composition Facility.

 

Distributed Relational Database Architecture (DRDA)
The architecture that defines formats and protocols for providing transparent access to remote data. DRDA defines two types of functions: the application requester function and the application server function.

 

Document Composition Facility (DCF)
An IBM licensed program used to format input to a printer.

 

double-byte character set (DBCS)
A set of characters in which each character is represented by 2 bytes. These character sets are commonly used by national languages, such as Japanese and Chinese, that have more symbols than can be represented by a single byte.

 

double-wide character
A character, such as a Kanji ideogram, that requires twice the nominal width of other characters, such as the letter A, for the character to be legible on a display screen or a printer.

 

DRDA
See Distributed Relational Database Architecture.

 

E

 

even parity bit
A check bit that is usually generated or included in a parity-checking algorithm to make the total number of bits in a bit pattern an even number. See also odd parity bit.

 

F

 

folding
The substitution of one graphic character for another. Folding generally maps a larger character set into a subset, and may result in loss of information. Folding allows the presentation of uppercase graphic characters when lowercase characters are not available. See also mono-casing.

 

full character set
The maximal character set of a code page such that there are no more unassigned graphic code points remaining in the associated encoding scheme. No other larger character set can be represented in that code page. For example, CS 697 (the maximal character set of CP 500 in encoding scheme ES 1100), contains 190 graphic characters and is assigned all the 190 available graphic code points in ES 1100. See also maximal character set, subset character set.

 

G

 

GCCASN
See graphic character conversion alternative selection number.

 

GCCST
See graphic character conversion selection table.

 

GCSGID
See graphic character set global identifier.

 

graphic character
A graphic symbol, such as a numeric, alphabetic, or special character (see C-S 3-3220-019 Corporate Standard).

 

graphic character conversion alternative selection number (GCCASN) A parameter of a function call to a graphic character data conversion process that facilitates selecting a specific conversion method and associated conversion tables from different alternatives.

 

graphic character conversion selection table (GCCST)
A table used in the graphic character data conversion process to manage the access to the various conversion methods and associated conversion tables under its sphere of control.

 

graphic character set
A defined set of graphic characters treated as an entity. No coded representation is assumed.

 

graphic character set global identifier (GCSGID)
A unique five-digit decimal number assigned to a graphic character set in IBM standards. The range of GCSGID values is 00001 to 65534 or x’0001’ to x’FFFE’ (see C-S 3-3220-019 Corporate Standard).

 

H

 

hardcoded
Pertaining to software instructions that are statically encoded and not intended to be altered.

 

high-level language (HLL)
A programming language that provides some level of abstraction from assembler language and independence from a particular type of machine.

 

Hindi numeral
Any of the set of numerals used in many Arabic countries instead of, or in addition to, the Arabic numerals. Hindi numeral shapes are ١٢٣٤٥٦٧٨٩, which correspond to the Arabic numeral shapes of 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9, respectively. See also Arabic numeral.

 

HLL
See high-level language.

 

identity map
A special case of code point conversion in which all input code points are equal to the output code points, thus eliminating the need for a conversion. When converting data from one CCSID to another using the round trip criterion, if the CCSIDs share the same CPGID, an identity mapping condition exists.

 

I

 

IEC
See International Electrotechnical Commission.

 

International Electrotechnical Commission (IEC)
The international standards-setting organization responsible for electrical and electrotechnical issues. IEC often cooperates with ISO via technical committees on the definition of standards.

 

International Organization for Standardization (ISO)
An international body charged with creating standards to facilitate the exchange of goods and services as well as cooperation in intellectual, scientific, technological, and economic activity.

 

International Telegraphic Alphabet Number 2 (ITA-2)
A CCITT-defined coded character set used in the international Telex communication services, worldwide.

 

invariant character set
A set of characters, such as the syntactic character set, having the same code point assignments in all coded character sets or code pages using a given encoding scheme. See also code page, coded character set, syntactic character set.

 

ISO
See International Organization for Standardization.

 

ISO environment
A coding structure defined in ISO 2022 that uses single (or multiple) septet(s) (7-bit) or octet(s) (8-bit) per code point, with or without code extension controls.

 

ITA-2
See International Telegraphic Alphabet Number 2.

 

K

 

Katakana
A Japanese phonetic syllabary used primarily for foreign names and place names and words of foreign origin.

 

L

 

Latin alphabet
An alphabet composed of the letters a - z and A - Z with or without accents and ligatures. See also non-Latin-based alphabet.

 

Latin alphabet number 1
The 190 characters used in most of Western Europe, North America, Central and South America. There are other Latin alphabets such as Latin-2 and Latin-3 that correspond to some of the other ISO/IEC 8859 character sets. The numbering scheme is neither rational nor orderly.

 

lowercase
Pertaining to the small alphabetic characters, whether accented or not, as distinguished from the capital alphabetic characters. The concept of case also applies to alphabets such as Cyrillic and Greek, but not to Arabic, Hebrew, Thai, Japanese, Chinese, Korean, and many other scripts. Examples of lowercase letters are a, b, and c.

 

M

 

machine-readable information (MRI)
All textual information contained in a program such as a system control program, an application program, or microcode. MRI includes all information that is presented to or received from a user interacting with a system. This includes messages, dialog boxes, online manuals, audio output, animations, windows, help text, tutorials, diagnostics, clip art, icons, and any presentation control that is necessary to convey information to users.

 

maximal character set
The largest registered character set that is assigned to a registered code page following a particular encoding scheme. See alsofull character set.

 

mono-casing
The translation of alphabetic characters from one case (usually the lowercase) to their equivalents in another case (usually the uppercase). See also folding.

 

MRI
See machine-readable information.

 

N

 

national use graphics
Graphic characters on a coded character set that are not part of the invariant character set.

 

nibble
A bit-pattern consisting of four bits.

 

non-Latin-based alphabet
An alphabet comprising letters other than the Latin-based ones, such as those used in Greek and Arabic.

 

normalization support CCSID table (NSCT)
A table containing a default CCSID value associated with a pair of CCSIDs, which will be used to normalize two strings (that are coded in two different CCSIDs), before a string operation such as concatenation, comparison, or others is performed with the two strings.

 

NSCT
See normalization support CCSID table.

 

O

 

octet
A byte composed of eight binary elements.

 

odd parity bit
A check bit that is usually generated or included in a parity-checking algorithm to make the total number of bits in a bit pattern an odd number. See also even parity bit.

 

R

 

related default CCSID table
A table containing a default CCSID associated with another CCSID and an ESID. This default CCSID is considered to be the nearest equivalent of its associated CCSID based on some relationship between the two.

 

Revisable-Form-Text Document Content Architecture (RFTDCA)
The architectural specification for the information interchange of documents whose text is in a revisable format. A Revisable-Form Text Document Content Architecture document consists of structured fields, controls, and graphic characters that represent the format and meaning of the document.

 

RFTDCA
See Revisable-Form-Text Document Content Architecture.

 

S

 

septet
A 7-bit byte.

 

session
A logical or virtual connection between two stations, software programs, or devices on a network that allows the two elements to communicate and exchange data for the duration of the session.

 

special character
A graphic character that is not a letter, a digit, or a space character and not an ideogram.

 

subset character set
A set of characters that is completely contained in another larger set of characters. See also full character set.

 

syntactic character set
A set of 81 graphic characters that are registered in the IBM registry as character set 00640. This set is used for syntactic purposes maximizing portability and interchangeability across systems and country or region boundaries. It is contained in most of the primary registered character sets, with a few exceptions. See also invariant character set.

 

system
A set of individual components, such as people, machines, or methods, that work together to perform a function.

 

T

 

tag
A mechanism used to identify certain attributes having some bearing on handling of character data. Some examples are character set identifier, code page identifier, language identifier, country identifier, and encoding scheme identifier.

 

U

 

UDC
See user-defined character.

 

uppercase
Pertaining to the capital alphabetic characters, as distinguished from the small alphabetic characters. The concept of case also applies to alphabets such as Cyrillic and Greek, but not to Arabic, Hebrew, Thai, Japanese, Chinese, Korean, and many other scripts. Examples of capital letters are A, B, and C. See also lowercase.

 

user
Any individual, organization, process, device, program, protocol, or system that uses the services of a computing system.

 

user-defined character (UDC)
A character which is defined by an individual user or organization for assignment in one or more code pages. These characters are often ideographic characters, symbols or logos. Some standards, including Unicode, reserve coding space for user defined characters. The meaning of the user defined character can only be assured within the closed environment of the defining organization or by private agreement among cooperating users.

 

W

 

ward
A section of a double-byte character set (DBCS) where the first byte of each DBCS code point belonging to that section is the same value.

Contact IBM

Need assistance with your globalization questions?