Using a variable name as the second argument of a lead or lag function
I wish to compute new variables in SPSS Statistics with the LAG and LEAD functions. (LAG(X) uses the value of X from the previous case while Lead(X) uses the value of X from the following case.) I would like to specify a variable name (e.g. CUSTLAG), rather than an integer, as the span of cases to lag or lead, so that a different span size is used for each case. For the CREATE and SHIFT VALUES commands, if one wants to use a LAG() or LEAD() function, the second argument must be an integer. The same is true for the lag function in the COMPUTE command. How can I apply such a function in SPSS Statistics? Would a VECTOR and LOOP or DO REPEAT structure be required?
The VECTOR-LOOP approach is problematic, as the LAG() function requires the first argument to be a variable name and not an expression, so a command such as the following returns an error message.
if (#i > 1) lag_span(#i) = lag(lag_span(#i-1)).
However, a similar construct with the DO REPEAT structure does work. Whereas the second argument to the LAG() function must be an integer, when the function is used in a DO REPEAT context and the the second argument is standing in for an integer (albeit a different integer in each iteration), then a nonnumeric value is accepted.
In the following commands, the lag of X is copied to the new variable X_lag. The number of cases spanned by the lag function, i.e. the second argument to the lag function, is contained in the variable custlag. So the following commands perform the equivalent of the command:
compute X_lag = lag(X, custlag).
which is not legal syntax.
* Find the lag of variable X, with the span of lag set by variable custlag.
* Custlag has integers 1 to 5.
* If your maximum lag span is greater than 5, increase the maximum value of i in the DO REPEAT command
* and change the name of the last variable created in the
* /v subcommand to create the number of variables required.
do repeat i = 1 to 5
/ v = lag_span1 to lag_span5.
compute v = lag(x,i).
if (i = custlag) x_lag = v.
* drop the intermediate variables.
match files / file = * / drop = lag_span1 to lag_span5 .
An alternate approach for the LAG function would involve a set of conditional transformations based on the value of CUSTLAG. Only one of the conditional expressions would be true and the corresponding LAG span would be used to compute X_LAG.
if (custlag = 1) x_lag = lag(x,1).
if (custlag = 2) x_lag = lag(x,2).
if (custlag = 3) x_lag = lag(x,3).
if (custlag = 4) x_lag = lag(x,4).
if (custlag = 5) x_lag = lag(x,5).
You would require an IF command for every observed value of CUSTLAG to implement this approach.
You can use the DO REPEAT approach with the LEAD function, but you need to create all the lead variables before starting the DO REPEAT commands. This is because LEAD is only available in the CREATE and SHIFT VARIABLE procedures, which cannot be included in a DO REPEAT structure. Here, CUSTLEAD is used as the variable with the desired span for the LEAD function for each case and the new variable is saved as X_LEAD..
* Find the lead of variable X, with the span of lead set by variable custlead.
* Custlead has integers 1 to 5.
/lead_span5=LEAD(x 5) .
do repeat j = 1 to 5
/ w = lead_span1 to lead_span5.
if (j = custlead) x_lead = w.
* drop the intermediate variables.
match files / file = * / drop = lead_span1 to lead_span5 .
It should be quite feasible to write an extension command to apply a variable LAG or LEAD span. Extension commands are routines that are written in PYTHON and/or R to perform special functions. They can be installed in SPSS to act like built-in SPSS procedures, including menu access. There are several extension command available for download here .The Programmability plugins for Python or R (depending on the extension command) must be installed to run these commands. Find more information on the Programmability plugins here .