IBM Support

Stratified random sampling in SPSS, equal percentage or count of each sample

Troubleshooting


Problem

Is it possible to have SPSS select a stratified random sample from a data set? For example, I have a data set that includes students from 100 schools. I want to select 20% of the students from each school. How would I sample a fixed number of students from each school. I do not have the Complex Samples module of SPSS Statistics.

Resolving The Problem

The following commands will select 20% of the cases from each school. In this example the school identifier variable is called SCHOOL A random number is generated for each case and these random numbers are ordered within school by the RANK command. Cases whose random numbers were within the lowest 20% within their school are selected.

COMPUTE ran1 = uniform(1).
RANK VARIABLES=ran1 (A) BY school
/PERCENT into schpct
/PRINT=NO .
SELECT IF (schpct <= 20).
EXECUTE .

The above commands can all be performed in the SPSS graphic user interface (GUI). The COMPUTE and RANK commands are available in the Transform menu from the Data Editor. The SELECT command is available in the 'Data->Select Cases' menu from the Data Editor.
In the Compute dialog, type 'ran1" in the 'Target Variable' box, type 'uniform(1)' in the 'Numeric Expression' box, and then click OK. The uniform(1) function generates a random number from a uniform distribution from 0 to 1, i.e., all real numbers between 0 and 1, exclusive, have an equal probability of being drawn.
In the Rank Cases dialog, paste RAN1 into the 'Variable(s)' box and paste SCHOOL into the 'by' box. Click the 'Rank Types' button. In the 'Rank Cases: Types' dialog, uncheck 'Rank' and check 'Fractional rank as %' and then click Continue and OK. The GUI for Rank does not allow you to input a name for the new rank variable (as in the syntax subcommand '/PERCENT into schpct', which stores the fractional % rank into SCHPCT). When RANK is run from the GUI, it will assign a name such as PRAN1 to a new fractional % rank variable, with an informative variable label such as "Percent of RAN1 by SCHOOL". You could rename this variable to SCHPCT in the Variable View if you wish.
In the 'Select Cases' dialog, click the 'If condition is satisfied' radio button and click the IF button underneath that line. In the 'Select Cases:IF' dialog, paste the SCHPCT variable into the box on the right, type ' <= 20' (without the quotation marks), then click Continue. If you want to save a file with only the sampled cases, then click the 'Deleted' radio button under 'Unselected Cases Are' and click OK.

Suppose that you wanted to select 100 students from each school. The commands above can be easily modified to sample a fixed count from each stratum, as follows:


COMPUTE ran1 = uniform(1).
RANK VARIABLES=ran1 (A) BY school
/RANK into schN
/PRINT=NO .
SELECT IF (schN <= 100).
EXECUTE .

The /PERCENT subcommand of the RANK command was replaced by the /RANK subcommand and the output variable was named schN, rather than schpct. The /RANK subcommand saves the rank number (from 1 to the sample size) in the output variable (schN). /RANK is actually the default rank type for the RANK command. If you are using the GUI, just make sure that the Rank box is checked in the "Rank Cases: Types" dialog. The other steps in the process are the same as those described for sampling a fixed percentage, as described above.

See Technote 1624273 for steps to stratified sampling with a minimum count from each stratum and a minimum percentage overall.

Although stratified sampling can be performed without the Complex Samples module, it must be noted that the procedures in most SPSS modules assume simple random sampling and standard errors of estimates do not reflect complex sampling designs. As well as performing the stratified sampling, the Complex Samples modules allows you to account for sampling design in a wide range of analyses, including general linear models, logistic regression and cross-tabulations.

[{"Product":{"code":"SSLVMB","label":"IBM SPSS Statistics"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"Not Applicable","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"Not Applicable","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Historical Number

32438

Document Information

Modified date:
16 April 2020

UID

swg21477266