Column width of string variables changes after importing text variable from Oracle database

Technote (FAQ)


Question

I have IBM SPSS Statistics 21 installed on a Windows platform with Oracle Wire Protocol driver from the last SPSS Data Access Pack (SDAP 6.1 SP3). I try to get the data from an Oracle database using this driver.
In the database I have some text variables (character) with a certain width.
However, when I import the character string variables into Statistics 21 I notice that the width of the string variables are often tripled or doubled than the original width in the database. Why is this?

Cause

This is functioning as designed. As explained in Technote #1502287, starting with version 21, IBM SPSS Statistics operates in Unicode mode by default.

Translating characters from non-Unicode ('locale') to Unicode results in as much as a tripling of the amount of information required to record string data.


Answer

When opening code page SPSS Statistics data files in Unicode mode or saving SPSS Statistics data files in Unicode encoding in code page mode, defined string widths are automatically tripled. Performing either of these actions repeatedly will triple the defined string widths each time. You can use ALTER TYPE to set the width of string variables to the longest value in the dataset for each string variable. This is valid for SAV file types and database import.


To avoid this you can do the following:
- On the Define Variables dialog from Database Wizard check the option 'Minimize string widths based on observed values'.
- Paste the GET DATA /TYPE ODBC syntax from the wizard.
- Add the ALTER TYPE ALL(A=AMIN) command below the GET DATA syntax.

Syntax example:
GET DATA
/TYPE=ODBC
/CONNECT='DSN=Oraclexxx;UID=SC;PWD=/X+M-z,I!\;Host=xxx;Port=1521;SID=xxx'
/SQL='SELECT "id", "stringvar1", "stringvar2" FROM SC"table1"'
/ASSUMEDSTRWIDTH=255.
CACHE.
EXECUTE.
ALTER TYPE ALL(A=AMIN).

You can also use ALTER TYPE to define the length of each variable. Syntax example:
GET DATA
/TYPE=ODBC
/CONNECT='DSN=Oraclexxx;UID=SC;PWD=/X+M-z,I!\;Host=xxx;Port=1521;SID=xxx'
/SQL='SELECT "id", "stringvar1", "stringvar2" FROM SC"table1"'
/ASSUMEDSTRWIDTH=255.
CACHE.
EXECUTE.
ALTER TYPE stringvar1 (A4) stringvar2 (A6).

However, the above solution is difficult to use when you import databases or data portions from other systems where the string length differs periodically and you need to merge all this data portions in one SAV file in which the variable formats for the string variables are strictly defined.
This procedure can only work based upon the condition that the formats of data taken from different systems are unchanged in time.
Applying the command ALTER TYPE ALL(A=AMIN) in the syntax does not guarantee the unchanging of the text variable formats.
Because the same variable can have different values in different periods.
For example, in one week the longest value of the variable has 6 characters, in other week it can be only 5.
In such cases the ALTER TYPE command will set different width for the same text variable and the data from different weeks will not be joined.

Therefore an Enhancement Request was submitted to not have to check 'Minimize string widths based on observed values'. Instead an option 'Minimize string widths based on defined length in database' should be implemented (Reference ID: ECM00186826)


Rate this page:

(0 users)Average rating

Add comments

Document information


More support for:

SPSS Statistics
Statistics Desktop

Software version:

21.0

Operating system(s):

Platform Independent

Reference #:

1623193

Modified date:

2013-01-23

Translate my page

Machine Translation

Content navigation