The International Components for Unicode (ICU) is a mature, portable set of C/C++ and Java libraries for Unicode support, software internationalization (I18N) and globalization (G11N), giving applications the same results on all platforms.
ICU Features
As computing environments become more heterogeneous, software portability becomes very important. ICU provides robust, full-featured Unicode services on a wide variety of platforms, without sacrificing performance. An open source project sponsored, supported, and used by IBM, ICU is providing robust, full-featured, commercial quality Unicode-based technologies. Supporting the most current version of the Unicode standard, including supplementary Unicode characters needed for support of the repertoires of GB 18030, HKSCS, and JIS X 0213, it offers great flexibility to extend and customize supplied services, including:
- Text: Unicode text handling, full character properties and character set conversions (500 + code pages)
- Analysis: Unicode regular expressions; full Unicode sets; character, word and line boundaries
- Comparison: language sensitive collation and searching
- Transformations: normalization, upper/lowercase, script transliterations (50 + pairs)
- Locales: comprehensive data (230 +) & resource bundle architecture
- Complex Text Layout: Arabic, Hebrew, Indic and Thai
- Formatting and Parsing: multi-calendar and time zone,dates, times, numbers, currencies, messages
Getting started with ICU
Find more information about ICU or see the ICU mailing list of contacts (links reside outside of ibm.com)

