Troubleshooting
Problem
Data that appears fine when viewed in the Database via tools such as Toad, cause errors when processing in DataStage
Symptom
Invalid character(s) ([x2013]) found converting Unicode string (code point(s): [x2013]) to codepage ASCL_ISO8859-1, substituting The character(s) may display in a log file as the character "?".
Cause
The Windows character set and encoding MS1252 is not compatible with that of ISO-8859-1. As such there are characters available in Windows that cannot be represented by default in Linux/Unix. The following table lists these characters. Some of these characters such as en dash and em dash may as glyphs that appear indistinguishable from characters that are in ISO-8859-1 such as hyphen-minus and can cause confusion.
Character | Character Name | Hex | Decimal | Unicode |
€ | Euro Sign | 80 | 128 | 20AC |
‚ | Single Low-9 Quotation Mark | 82 | 130 | 201A |
ƒ | Latin Small Letter F with Hook | 83 | 131 | 0192 |
„ | Double Low-9 Quotation Mark | 84 | 132 | 201E |
… | Horizontal Ellipsis | 85 | 133 | 2026 |
† | Dagger | 86 | 134 | 2020 |
‡ | Double Dagger | 87 | 135 | 2021 |
ˆ | Modifier Letter Circumflex Accent | 88 | 136 | 02C6 |
‰ | Per Mille Sign | 89 | 137 | 2030 |
Š | Latin Capital Letter S with Caron | 8A | 138 | 0160 |
‹ | Single Left-Pointing Angle Quotation Mark | 8B | 139 | 2039 |
Œ | Latin Capital Ligature OE | 8C | 140 | 0152 |
Ž | Latin Capital Letter Z with Caron | 8E | 142 | 017D |
‘ | Left Single Quotation Mark | 91 | 145 | 2018 |
’ | Right Single Quotation Mark | 92 | 146 | 2019 |
“ | Left Double Quotation Mark | 93 | 147 | 201C |
” | Right Double Quotation Mark | 94 | 148 | 201D |
• | Bullet | 95 | 149 | 2022 |
– | En Dash | 96 | 150 | 2013 |
— | Em Dash | 97 | 151 | 2014 |
˜ | Small Tilde | 98 | 152 | 02DC |
™ | Trade Mark Sign | 99 | 153 | 2122 |
š | Latin Small Letter S with Caron | 9A | 154 | 0161 |
› | Single Right-Pointing Angle Quotation Mark | 9B | 155 | 203A |
œ | Latin Small Ligature OE | 9C | 156 | 0153 |
ž | Latin Small Letter Z with Caron | 9E | 158 | 017E |
Ÿ | Latin Capital Letter Y with Diaeresis | 9F | 159 | 0178 |
Environment
Most likely to be seen when processing data originally from a Windows machine on a Unix based system.
Diagnosing The Problem
Look at the data in hex to ascertain the encoding of the character in question, if from a text file use od to view or if from a database use a relevant function such as CONVERT to varbinary in SQLServer.
Resolving The Problem
Use the NLS map for MS1252 such as windows-1252 or ASCL_MS1252
Was this topic helpful?
Document Information
Modified date:
16 June 2018
UID
swg21984257