Unicode transformation formats

Introduction

Once you have decided to start using the Unicode standard to encode character data within an application, the next step is to decide which of the various Unicode encoding schemes you will be using to store data. Although Unicode is a single unified standard for a wide spread of encoding characters used in languages today, there are three widely accepted schemes, or Unicode transformation formats ( UTF's ), that you might use when processing Unicode data: UTF-8, UTF-16, and UTF-32. Each one has inherent advantages and disadvantages depending upon the types of characters you intend to be handling and whether memory or disk space is plentiful. In this article, we will take a closer look at the structure of Unicode, and at each of the transformation formats.


Contact IBM

Need assistance with your globalization questions?

Topic Contents