Partial schemas

Some parallel job stages allow you to use a partial schema. This means that you only need define column definitions for those columns that you are actually going to operate on.

The stages that allow you to do this are file stages that have a Format tab. These are:

  • Sequential File stage
  • File Set stage
  • External Source stage
  • External Target stage
  • Column Import stage

You specify a partial schema using the Intact property on the Format tab of the stage together with the Schema File property on the corresponding Properties tab. To use this facility, you need to turn Runtime Column Propagation on, and provide enough information about the columns being passed through to enable InfoSphere DataStage to skip over them as necessary.

In the file defining the partial schema, you need to describe the record and the individual columns. Describe the record as follows:

  • intact. This property specifies that the schema being defined is a partial one. You can optionally specify a name for the intact schema here as well, which you can then reference from the Intact property of the Format tab.
  • record_length. The length of the record, including record delimiter characters.
  • record_delim_string. String giving the record delimiter as an ASCII string in single quotes. (For a single character delimiter, use record_delim and supply a single ASCII character in single quotes).

Describe the columns as follows:

  • position. The position of the starting character within the record.
  • delim. The column trailing delimiter, can be any of the following:
    • ws to skip all standard whitespace characters (space, tab, and newline) trailing after a field.
    • end to specify that the last field in the record is composed of all remaining bytes until the end of the record.
    • none to specify that fields have no delimiter.
    • null to specify that the delimiter is the ASCII null character.
    • ASCII_char specifies a single ASCII delimiter. Enclose ASCII_char in single quotation marks. (To specify multiple ASCII characters, use delim_string followed by the string in single quotes.)
  • text specifies the data representation type of a field as being text rather than binary. Data is formatted as text by default. (Specify binary if data is binary.)

Columns that are being passed through intact only need to be described in enough detail to allow InfoSphere® DataStage® to skip them and locate the columns that are to be operated on.

For example, say you have a sequential file defining rows comprising six fixed width columns, and you are in interested in the last two. You know that the first four columns together contain 80 characters. Your partial schema definition might appear as follows:


record { intact=details, record_delim_string = '\r\n' }
    (    colstoignore: string [80]
         name: string [20] { delim=none };
         income: uint32 {delim = ",", text };

Your stage would not be able to alter anything in a row other than the name and income columns (it could also add a new column to either the beginning or the end of a row).