IBM Support

Raking or Rim Weighting in SPSS Statistics

Troubleshooting


Problem

Does SPSS Statistics offer raking (also known as rim weighting)?

Resolving The Problem

SPSS Statistics does not currently have a standard procedure for raking or rim weighting, but it can be done either using an available extension or manually, using a loglinear modeling procedure such as GENLOG (either approach requires the Advanced Statistics module).

The extension for raking or rim weighting is SPSSINC_RAKE, available from developerWorks or directly installable from the program beginning with Version 22 via Utilities>Extension Bundles>Download and Install Extension Bundles. This requires the Python Essentials. Once installed, SPSSINC_RAKE provides a dialog box interface, accessed in the menus via Data>Rake Weights. The extension provides an automated way to obtain the results given by the steps in the following manual solution. The steps in the manual process are as follows:

The general steps are:

1) Identify the variables to use to define the weighting scheme, and the desired marginal counts for each one. The desired marginal totals for each one should add to the same number, which is the desired total value for the entire table (this may be the original weighted or unweighted sample size, or it may be a population value or some other standard value).

2) Assuming the data are not aggregated to begin with, produce an aggregated file, breaking on these variables and saving the Ns for the different combinations of the variables (the cells in the table produced by aggregating on them, or what you would see if you crosstabulated them).

3) If there are any missing combinations, add these to the aggregated file, with counts of 1e-8 (or some similar very small number).

4) Compute a desired marginal count variable for each of these weighting variables. The values of this marginal count value should be the target count value for each level of each variable.

5) Compute the expected values for each cell of the table under an independence model. These are computed by taking the product of all the marginal variables and dividing it by the desired table total value (which is the sum of any of the marginal variables) to the power of one less than the number of marginal variables (i.e., for two variables, it's just the total, for three, it's the total squared, etc.).

6) Weight the data by the newly created expected values variable.

7) Run the GENLOG procedure using the variables that define the weighting structure as factors in a main effects only or independence model, using the observed count variable as a cell structure variable, and saving the expected or predicted counts from this model back to the data. Make sure the estimation converges. Tightening down the convergence criterion and bumping up the iterations can, within reason, only make things more precise.

8) The saved predicted counts are the desired weighted values of the cells. If you began with aggregated table data, you simply weight the file by these counts and you're done (the rim weights have been applied).

9) Assuming that you began with individual data, compute the rim weights by dividing the saved predicted values by the observed counts, after first deleting any cases you added for empty cells.

10) Compute a matching id variable using the weighting variables that will have a unique value for each combination of these variables. Some sort of concatenation of values will suffice (for example, if they're all numeric and integers beginning with 1 and with fewer than 10 values for each one, you can compute this as the value of one of them plus ten times the value of another one plus 100 times the value of a third, etc.).

11) Sort the file on this variable. Save the file.

12) Get the original file.

13) Repeat the creation of the matching id variable and sort the data by it.

14) Match the files, using the previously saved file as a table, matching on the match id variable and keeping the original data and the rim_weight variable.

When all is said and done, you have rim weighted data. That is, the data have been weighted so that the desired marginal counts are produced. Note that this is a form of poststratification weighting, and the resulting weighted data can only be used for descriptive purposes in SPSS. Even with the SPSS Complex Samples module, SPSS currently cannot produce valid inferential statistics for poststratification-weighted data.

[{"Product":{"code":"SSLVMB","label":"IBM SPSS Statistics"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"Not Applicable","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"Not Applicable","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Historical Number

31840

Document Information

Modified date:
16 April 2020

UID

swg21479973