Hidden characters in text output (CSV) when exported, Byte Order Mark (BOM)
When I export data in a CSV format from IBM SPSS Statistics and try to use it with a third party application, the third party application advises the file is corrupt, yet if I look at the file in notepad it appears fine.
The issue is caused by additional characters added for the unicode encoding, called Byte Order Mark or BOM.
Diagnosing the problem
If you edit the file with a hex editor you will see 3 characters at the beginning of the text that was exported from IBM SPSS Statistics, below are the hex values you will see in the hex editor:
ef bb bf
Resolving the problem
A quick way around this would be to do one of the following suggestions copied from the online help system on how to change between encoding modes:
Starting with version 21, IBM SPSS Statistics operates in Unicode mode by default. If you prefer to work in code page mode, you can change the default:
- In the dialog interface: Edit>Options>Language. In the Character Encoding for Data and Syntax group, select Locale's writing system. See the topic Language options for more information.
- In command syntax:
Data files saved in Unicode encoding cannot be read by versions of IBM SPSS Statistics prior to 16.0.