The importance of coded character sets

One encounters textual information every day. For those who work with computers, text appears in e-mails, documents, Web pages, forms you fill out while making online transactions, and so forth. Each onscreen character that was entered from a keyboard is represented in the computer, in the data bank, and in the communication protocol as a number in a pre-specified form--often called a code point, a bit pattern or character code--from a coded character set.

If you are writing an application using a set of system-provided services, you will be concerned only with using the correct interfaces (APIs), amd that the appropriate coded character set-related parameters are supplied to the API. At this level, you don't need to know the detailed definition of the coded character sets used. However, you will need to understand the factors that influence the results returned from the invoked functions or methods, some of which are related to the nature of the coded character sets .

If you are writing rendering or visualization software, you need to know the number assigned to a particular character in order to display or print that character as the user expects. If you are writing a piece of software to order a list of names, you need to know the numbers assigned to each of the letters that can appear in the list of names, so that the sorting can be done appropriate to the user expectation. You need to know the details of the set of numbers and the character assigned to each one of them, or to the coded character set.

Hardware and software components interact with each other using either the same number for that character everywhere, or by converting to and from another number understood by that particular component. If an application uses a different coded character set than the software component it uses for rendering, a conversion is required to change each code point from the application to the corresponding code point in the coded character set of the rendering software. The person who writes the converter needs to know the details of both coded character sets.

If you are writing a Web server, and the textual data is to be sent to the Web client in a coded character set the client understands, you need to know the details of the coded character set used in the source text, and convert it to code points in the client's coded character set when preparing the buffer to be sent to the client.

In the global e-business world, data can be generated anywhere and consumed anywhere. In this scenario, you will encounter several coded character sets being used for the textual representation. As a minimum, conversions between Unicode and non-Unicode encoded text have to be considered.

An understanding of what a coded character set is and the different elements behind its definition will enable you to deal with the textual data in the coded character sets you may encounter, so that you can deal with textual integrity problems effectively.

Continue to What is a coded character set