Whenever you deal with textual data, the coded character set is the link between the data as perceived by the user and the numbers assigned to them in the computer. Once you understand the definitions behind the coded character sets, you will be able to retain the character boundaries in all your processing. When text is converted to another coded character set environment, such as sending information in UTF-8 on the Web, you should know about the attributes of the coded character sets that can come into play and design the converter accordingly. If you are going to use an existing converter, you should be aware of the information that has to be passed to the converter, of the factors that affect items such as buffer allocations, and of what can be expected in the results of conversion. You should also have an understanding of how to deal with characters that may not be in a given coded character set definition. Hopefully, this article would have helped you in this regard.
IBM's globalization strategy is to use Unicode for text representation. The non-Unicode coded character sets will still be encountered on the web, and in systems and databases connected to the web.
Over the years, IBM experts have been participating and contributing to the International Standardization projects on coded character sets either directly--in the working groups of JTC1/SC2--or at the national and regional levels, as well as in industry consortia such as the Unicode consortium. IBM also has a corporate standards program to ensure that IBM-defined coded character sets are consistent with various national and international standards.
IBM is also contributing to furthering the use of Unicode everywhere via ICU (International Components for Unicode) as Java and C libraries. These libraries include an extensive set of conversion support for different coded character sets to and from Unicode.