IBM Support

Restructure Q-sort responses to proximity matrix in SPSS

Troubleshooting


Problem

I have an SPSS data file with responses from a Q-sort study and wish to reorganize the data into a matrix of co-occurrence frequencies for analysis by CLUSTER or ALSCAL. Each respondent was given 21 stimuli and asked to sort them into piles that represented similar groups. Each respondent is a case in my data file and variables ITEM1 to ITEM21 identify the piles into which the respective stimuli were placed. For example, if a respondent placed the 5th stimulus into his/her 8th pile, item5 would hold an 8 for that respondent. My target data structure is a 21x21 matrix in which the element in row i and column j indicates the number of respondents who placed items i and j in the same pile. The numerical ordering of piles is irrelevant (although this is not necessarily true for all Q-sort studies). Items i and j may be grouped together in the 4th pile for one respondent; the 9th pile, for a different respondent. Both observations would add to the (i,j) and (j,i) elements of the matrix.

Resolving The Problem

The SPSS syntax below restructures the data in the manner requested. The assumption is made that you have already read a data file into SPSS. As described above, cases are individual respondents and the variables ITEM1 to ITEM21 hold the pile numbers into which the respective items were sorted by each respondent.

**********************************.
* First, create a file in which each case is a 21x21 matrix.
* If the respondent had sorted items j and k in a pile together,
* then hit(j) in the kth row and hit(k) in the jth row would
* equal 1;
* if j = k, hit(j,j) = 1;
* otherwise, hit(j,k)= hit(k,j) = 0.
vector it = item1 to item21.
vector hit (21, F4).
loop #j = 1 to 21.
+ compute item = #j.
+ loop #k = 1 to 21.
+ compute hit(#k) = (it(#j) = it(#k)).
+ end loop.
+ xsave outfile = hits.sav /keep = id item hit1 to hit21 .
end loop.
execute.

* Build a file in which there is a single 21x21 matrix,
* aggregated across all cases. The values are the number
* of times the row and column stimuli appeared together
* in a file. Add ROWTYPE_ and VARNAME_ column that
* ALSCAL, CLUSTER, and PROXIMITIES can read.
get file = hits.sav .

* see the end of this note for alternate code that employs
* less generic variable names for the final matrix .
aggregate outfile = *
/break = item
/item1 to item21 = sum(hit1 to hit21).

* Variables ROWTYPE_ and VARNAME_ are set up for the cluster analysis.
string ROWTYPE_ VARNAME_ (A8).
COMPUTE ROWTYPE_ = 'PROX' .
compute VARNAME_ = concat('ITEM',ltrim(string(item,f2))) .
* Be sure that PROX and the variable name stem (ITEM in this
* example) are in capital letters.
VALUE LABELS rowtype_ 'PROX' 'SIMILARITY'.

save outfile = qsmat.sav
/ keep = ROWTYPE_ VARNAME_ item1 to item21.
get file = qsmat.sav .

* Analyze the data with ALSCAL (which performs Multidimensional Scaling, or MDS)
* and CLUSTER (hierarchical clustering), which are both in the Statistics Base module.

* LEVEL = ORDINAL (SIMILAR) in the ALSCAL command below
* helps ALSCAL recognize the data as a similarity matrix,
* rather than the default distance matrix input .
ALSCAL VARIABLES= item1 to item21
/SHAPE=SYMMETRIC
/LEVEL=ORDINAL (SIMILAR)
/CONDITION=MATRIX
/MODEL=EUCLID
/CRITERIA=CONVERGE(.001) STRESSMIN(.005) ITER(30)
CUTOFF(0) DIMENS(2,2)
/PLOT=DEFAULT
/PRINT=DATA HEADER .

* Cluster needs the /MATRIX IN subcommand to recognize data as a proximities matrix.
* The "SIMILARITY" value label for ROWTYPE_= "PROX" tells CLUSTER that the
* proximities are similarities .

CLUSTER
/MATRIX IN (*)
/METHOD BAVERAGE
/PRINT SCHEDULE CLUSTER(3,7)
/PLOTS DENDROGRAM .

* analyze the data with PROXSCAL, an MDS procedure in the Categories module.


PROXSCAL VARIABLES=item1 to item21
/SHAPE=BOTH
/INITIAL=SIMPLEX
/TRANSFORMATION=INTERVAL
/PROXIMITIES=SIMILARITIES
/ACCELERATION=NONE
/CRITERIA=DIMENSIONS(2,2) MAXITER(100) DIFFSTRESS(.0001) MINSTRESS(.0001)
/PRINT=COMMON DISTANCES STRESS DECOMPOSITION
/PLOT=COMMON.


*****************************************.
Suppose that you wish to use variable names that are more descriptive and meaningful than ITEM1 to ITEM21. In this example the 21 items refer to 21 recreational activities. The following syntax is similar to the solution above, with the meaningful names introduced in the aggregate command. Note that if your original data file used the meaningful names as variable names, rather than item1 to item21, then replacing the command:

vector it = item1 to item21.

with the command:

vector it = archery to track .

is perfectly valid. When the vector is assigned to existing variables, those variables do not need to have similar names. They just need to be adjacent in the file. You would still need to reintroduce the meaningful names in the aggregate command, listing each name as shown below.


*************************************************.

vector it = item1 to item21.
vector hit (21, F4).
loop #j = 1 to 21.
+ compute item = #j.
+ loop #k = 1 to 21.
+ compute hit(#k) = (it(#j) = it(#k)).
+ end loop.
+ xsave outfile = hits.sav /keep = id item hit1 to hit21 .
end loop.
execute.
get file = hits.sav .

aggregate outfile = *
/break = item
/archery badmin baseball basktbl bowling canoe curling
bike dance football hike lacrosse hockey rockclmb sail
soccer surf swim tennis triath track = sum(hit1 to hit21).


string ROWTYPE_ (A8).
COMPUTE ROWTYPE_ = 'PROX' .
save outfile = qsmat.sav
/keep = ROWTYPE_ archery to track .

FLIP
VARIABLES= archery to track .
rename variables (case_lbl = varname_).
match files /file = * /file = qsmat.sav / drop = var001 to var021.
execute.
* the execute forces the execution of the match, which save would also
do, but you may want
* to examine the active file before saving .
VALUE LABELS rowtype_ 'PROX' 'SIMILARITY'.
save outfile = qsmat.sav
/keep = ROWTYPE_ VARNAME_ archery to track .

get file = qsmat.sav .

* You can now analyze the co-occurrence matrix.
ALSCAL VARIABLES= archery to track
/SHAPE=SYMMETRIC
/LEVEL=ORDINAL (SIMILAR)
/CONDITION=MATRIX
/MODEL=EUCLID
/CRITERIA=CONVERGE(.001) STRESSMIN(.005) ITER(30)
CUTOFF(0) DIMENS(2,2)
/PLOT=DEFAULT
/PRINT=DATA HEADER

[{"Product":{"code":"SSLVMB","label":"IBM SPSS Statistics"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"Not Applicable","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"Not Applicable","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Historical Number

16651

Document Information

Modified date:
16 June 2018

UID

swg21488410