The CASESTOVARS command occasionally fails to "expand" certain variables in the resulting restructured dataset.
I'm using IBM SPSS Statistics and have received some puzzling results when restructuring a file with the CASESTOVARS command (Data->Restructure). The original variables to be restructured are occasionally only copied to a single variable in the new file. This has occurred when there is only one valid value across the records within each unique ID value, and also when there is a combination of one valid value and system-missing. For example, I ran the following commands:
data list free / Id rec x y .
11 1 1 2
11 2 . 3
11 4 . 1
12 1 . 4
12 3 . 4
12 4 2 2
12 5 2 3
13 1 . 2
14 1 . 3
formats rec x y (f4).
* I then ran CASESTOVARS to restructure the file.
What I expected to see in the LIST output was:
Id x1 x2 x3 x4 x5 y1 y2 y3 y4 y5
11 1 . . . . 2 3 . 1 .
12 . . . 1 . 4 . 4 2 3
13 . . . . . 2 . . . .
14 . . . . . 3 . . . .
What was actually printed in the LIST output was:
Id x y1 y2 y3 y4 y5
11 1 2 3 . 1 .
12 1 4 . 4 2 3
13 . 2 . . . .
14 . 3 . . . .
There is only a single X variable in the restructured file, although Y was restructured into Y1 to Y5, as expected. If I rerun the commands, but first recode sysmis to 0 for X and Y before the CASESTOVARS command, then X is also restructured into X1 to X5.
It appears that the combination of a single valid value plus system-missing values within each ID is treated as a within-ID constant by CASESTOVARS. Is this interpretation correct? Is the treatment of system-missing values by CASESTOVARS explained in IBM SPSS Statistics documentation? Is there a workaround to force a variable such as X in this example to be restructured into a set of variables as determined by the value of the record variable?
Resolving the problem
This behavior is expected and the CASESTOVARS command is functioning as designed. This is the result of the default setting of /AUTOFIX =YES.
The Command Syntax Reference provides the following notes for the /AUTOFIX subcommand:
The AUTOFIX subcommand evaluates candidate variables and classifies them as either fixed or
as the source of a variable group.
A candidate variable is a variable in the original data that does not appear on the SPLIT
command or on the ID, INDEX, and DROP subcommands.
An original variable that does not vary within any row group is classified as a fixed variable
and is copied into a single variable in the new data file.
An original variable that has only a single valid value plus the system-missing value within
a row group is classified as a fixed variable and is copied into a single variable in the new
An original variable that does vary within the row group is classified as the source of a
variable group. It becomes a variable group in the new data file.
Use AUTOFIX=NO to overrule the default behavior and expand all variables not marked as
ID or fixed or record into a variable group.
A row group is a set of cases with the same ID variable. The cases in a row group will become a single case in the restructured file.
In the example above, ID 11 had a single valid value 1, plus system-missing values observed for X. ID 12 had the single valid value 2, plus system-missing observed for X. The remaining cases had only system missing values observed for X. By the above rules, X is treated as a constant within ID groups, i.e. a fixed variable, and X becomes a single variable in the restructured file. When you recoded system-missing to 0 before running the CASESTOVARS command, you introduced within-ID variation in valid values for X (0 and 1 for ID 11; 0 and 2 for ID 12), so that X became a variable group in the restructured file.
As an alternate workaround, one that wouldn't require any recoding, simply add the subcommand "/autofix=no" to the CASESTOVARS command:
/autofix = no .
You'll get a warning that X does not vary within id groups, but X will be restructured into multiple variables as you wished.
When autofix equals "YES" (the default), the CASESTOVARS command identifies empirically the variables that vary within row groups, ignoring SYSMIS values, and infers the remaining variables to depend on the ID, i.e. to be constant within an ID group.