IBM Support

Tivoli Storage Manager client's behavior with Unicode IVS/IVD symbols

Question & Answer


Question

Does Tivoli Storage Manager client handle Unicode IVS/IVD symbols adequately?

Cause

Unicode IVS/IVD is a new standard defined as : Unicode Technical Standard #37. It makes software programs to specify the detailed graphics of the character, and it has been supported by some Windows applications. (For example: Windows 2008 Explorer, etc.)
But there is no description about it in any Tivoli Storage Manager documents.

Answer

Functionally Yes. But there are a few 'display-only' limitations and an additional consideration.

[Example]
Some Japanese users may use Unicode IVS on their Windows Environment, as below
- Installing one or more font set (e.g. IPAmg-Mincho Font http://mojikiban.ipa.go.jp/1300.html )
- Use some Unicode IVS symbols in file names.
For example:
6FA4 E0102: Hanyo-Denshi; JTB4EE & Moji_Joho; MJ016004
6FA4 E0103: Hanyo-Denshi; JTB4C9 & Moji_Joho; MJ016003
( Please refer: http://www.unicode.org/ivd/data/2014-05-16/IVD_Stats.txt):
Then, Windows 2008 Explorer can display these symbols correctly.
However, Tivoli Storage Manager backup-archive client UIs --- command line console, GUI, and Web GUI ---- cannot display those characters correctly. They display garbage characters in the file name.


Backup and restore operations for such files by means of Windows client shouldn't have any problem.

The symbols above
6FA4 E0102: Hanyo-Denshi; JTB4EE & Moji_Joho; MJ016004
6FA4 E0103: Hanyo-Denshi; JTB4C9 & Moji_Joho; MJ016003

are coded in UTF-16 with the help of so called "surrogate pair".
The real code of 6FA4_E0102 is 0x6FA4DB40DD02 which takes 3 wchar_t symbols. The first wchar 0x6FA4D represents the base ideograph, it is a part of Basic Multilingual Plane (BMP). The rest DB40DD02 is the surrogate pair representing the variation selector character.
6FA4_E0103 is coded as 0x6FA4DB40DD03.
Facts:

  1. As soon as the Windows API scan functions (FindFirstFileW/FindNextFileW) return file names in UTF-16, the Tivoli Storage Manager client gets correct file names.
  2. the names are sent to the server in UTF-8 and here we don't have any problem too because it's one-to-one correspondence.


There is a display-only problem and it is a limitation because it cannot be fixed inside the Tivoli Storage Manager client.

Windows console output is MBCS-based. If the user locale is Japanese_Japan.932, the console codepage using for output is 932. Conversion of both 6FA4_E0102 and 6FA4_E0103 to 932 cp produces the string of
"0XE0 0X56 0X3F 0X3F". Note that 0X3F means that the symbols cannot be converted to MBCS, Tivoli Storage Manager client displays such symbols as underscore sign "_". This is why you see in your console the file name with underscores.

Java GUI. The Java trace conforms that the Java GUI gets the correct UTF-16 coding for the symbols in question.
Apparently, Java doesn't recognize surrogate pairs nevertheless Java operates with symbols in Unicode. That's why Java displays extra squares (REPLACEMENT CHARACTER).

Both JGUI (dsm) and Web GUI use the same JRE and thus, have the same limitation.



There is an additional consideration about file and path name length in characters.

Tivoli Storage Manager client has limits on file and path name length in characters. For Windows they are 255 characters for file name and 6000 for full path. If a file name exceeds the limit it will not be backed up. The length is returned by system API wcslen(). I just did a few tests and see that for supplemented UCS symbols the wcslen() returns 3 for one symbol (instead of 1). So, if some name consists only of such "long" symbols, the limitation becomes 3 times less - 85 characters.

This problem occurs on using Unicode IVS Kanji symbols. Usual Kanji symbols do not have this problem. For example, a file name can contain up to 255 "0x6FA4" symbols.

[{"Product":{"code":"SSGSG7","label":"Tivoli Storage Manager"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"Client","Platform":[{"code":"PF033","label":"Windows"}],"Version":"All Supported Versions","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
17 June 2018

UID

swg21677307