Unicode has provided a foundation for communicating textual data. However, the locale-dependant data used to drive features such as collation and date/time formatting may be incorrect or inconsistent between systems. This may not only present an irritating user experience, but prevent accurate data transfer.
The Unicode Consortium's Common Locale Data Repository (CLDR) is the most extensive repository of standardized locale data. Many companies use this data in their software globalization, to support the world's languages and cultural conventions through:
The latest version of CLDR with detailed information including links for downloading can be found on the CLDR web site at cldr.unicode.org
Traditionally, the data associated with locales provides support for formatting and parsing of dates, times, numbers, and currencies; for the default units of currency; for measurement units, for collation (sorting), plus translated names for time zones, languages, countries, and scripts. In addition to these, CLDR supplies locale data for a wide variety of types of information, including rules for determining text boundaries (character, word, line, and sentence), text transformations (including transliterations), rule based number formatting (number spellout), and many others.
Locale Data Markup Language is the XML format for specifying locale data in the repository as well as for the interchange of structured locale data. See the
Unicode Technical Standard # 35 Unicode Locale Data Markup Language (LDML) for a detailed description. It should be noted that LDML is designed to provide a format suited for data interchange, and that it is common for application libraries such as ICU to restructure the data in ways that provide better performance at run time. IBM was one of the founding members of the workgroup which developed LDML and version 1.0 of the CLDR. In 2004 the CLDR project moved to the Unicode Consortium, and IBM continues to play a significant role in this project even today.