PM87684: PROBLEMS TO READ UTF-8 FIXED WIDTH TEXT FILE WITH DATA LIST AND WITH GET DATA WHEN UNICODE IS ON

Subscribe

You can track all active APARs for this component.

APAR status

  • Closed as user error.

Error description

  • You work with IBM SPSS Statistics 21.0.0.1 and would like to
    import an UTF-8 text file with DATA LIST command. The text file
    has no delimiter between the variables and created with fixed
    width. However, the text file contains a string variable with a
    special characters in the text and some rows do not contain
    these characters in the string variable.
    For example your command look like the below:
    
    DATA LIST  FILE = 'C:\Temp\yourfile.txt'
    
      ENCODING = 'UTF-8'
    
      /Id 1 town 2-7 (A) code 8-11.
    
    EXECUTE.
    
    LIST.
    
    Town is the string variable which can have special characters
    like Umlauts. As a result in 21.0.0.1 you get warnings in the
    viewer that an invalid numeric field have been found and the
    LIST command display the result that for the rows with no
    special characters in town variable the import works but not for
    the rows with special characters where you get a sysmis.
    
    When you use the below GET DATA syntax while working in locale
    system setting it works fine and all variables are imported
    correctly:
    
    GET DATA
      /TYPE=TXT
      /FILE="C:\Temp\yourfile.txt"
      /FIXCASE=1
      /ARRANGEMENT=FIXED
      /FIRSTCASE=1
      /IMPORTCASE=ALL
      /VARIABLES=
      /1 id 0-0 F1.0
      town 1-6 A6
      code 7-10 F4.0.
    CACHE.
    EXECUTE.
    

Local fix

  • This is not a defect and working as designed.
    Please note, that in UTF-8, extended characters take two or more
    bytes.  The
    column specs are in bytes, and, for example, the string column
    definition cannot work for the special characters in some rows
    as they take at least two bytes where the rows with no special
    characters do need only one byte. The numeric value actually
    starts in column 9 except for the last row.  The last row in the
    text file contains no extended characters.
    Either use UTF-8 text files with delimiter when you have a
    mixture a extended characters needing more bytes or import the
    fixed width text file in local encoding mode but not in Unicode
    mode.
    

Problem summary

Problem conclusion

Temporary fix

Comments

  • This is not a defect and working as designed.
    Please note, that in UTF-8, extended characters take two or more
    bytes.  The
    column specs are in bytes, and, for example, the string column
    definition cannot work for the special characters in some rows
    as they take at least two bytes where the rows with no special
    characters do need only one byte. The numeric value actually
    starts in column 9 except for the last row.  The last row in the
    text file contains no extended characters.
    Either use UTF-8 text files with delimiter when you have a
    mixture a extended characters needing more bytes or import the
    fixed width text file in local encoding mode but not in Unicode
    mode.
    

APAR Information

  • APAR number

    PM87684

  • Reported component name

    SPSS STATISTICS

  • Reported component ID

    5725A54ST

  • Reported release

    L00

  • Status

    CLOSED USE

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2013-04-24

  • Closed date

    2013-05-02

  • Last modified date

    2013-05-02

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

Applicable component levels



Rate this page:

(0 users)Average rating

Add comments

Document information


More support for:

SPSS Statistics
Statistics Desktop

Software version:

21.0

Reference #:

PM87684

Modified date:

2013-05-02

Translate my page

Machine Translation

Content navigation