Including schema variables in an interface schema

Many operators pass an entire record unchanged from the input data set to the output data set. These operators can add new result fields to the record of the output data set, but they also propagate the original record through to the output data set.

This topic provides background information required to use schema variables, including a further discussion of the concept of schema variables and an explanation of the transfer mechanism. Then two examples are provided: one showing how to use schema variables with a one-input operator and the other showing how to create a two-input operator.

Background

The following figure shows an operator containing schema variables in both its input and its output interface schemas:

Figure 1. Input schema variables

This figure shows the same operator with an input data set:

Figure 2. Output schema variables

By default, a schema variable in an input interface schema corresponds to an entire record of the input data set, including the record schema. In this example:

"inRec:*" ≡ "fName:string; lName:string; age:int32;"

Performing an operation on inRec corresponds to performing an operation on an entire record and the record schema of the input data set.

When a transfer is declared, an output schema variable, by default, assumes the record schema of the variable in the input interface. As you can see in the previous figure, outRec assumes the schema of inRec, which corresponds to the record schema of the input data set. Therefore, the output interface schema of the operator is:

"outRec:*" = "inRec:*" ≡ "fName:string; lName:string; age:int32;"

By using a variable to reference a record, operators can copy, or transfer, an entire record from an input to an output data set. The process of copying records from an input to an output data set then becomes much simpler because you do not have to define any field accessors.