z/OS DFSMS Using Data Sets
Previous topic | Next topic | Contents | Contact z/OS | Library | PDF


Key Compression

z/OS DFSMS Using Data Sets
SC23-6855-00

Index entries are variable in length within an index record because VSAM compresses keys. That is, it eliminates redundant or unnecessary characters from the front and back of a key to save space. The number of characters that can be eliminated from a key depends on the relationship between that key and the preceding and following keys. Start of change Note: VSAM index key compression is not related to MVS data compression or compressed format data sets. End of change

For front compression, VSAM compares a key in the index with the preceding key in the index and eliminates from the key those leading characters that are the same as the leading characters in the preceding key. For example, if key 12356 follows key 12345, the characters 123 are eliminated from 12356 because they are equal to the first three characters in the preceding key. The lowest key in an index record has no front compression; there is no preceding key in the index record.

There is an exception for the highest key in a section. For front compression, it is compared with the highest key in the preceding section, rather than with the preceding key. The highest key in the rightmost section of an index record has no front compression; there is no preceding section in the index record.

What is called “rear compression” of keys is actually the process of eliminating the insignificant values from the end of a key in the index. The values eliminated can be represented by X'FF'. VSAM compares a key in the index with the following key in the data and eliminates from the key those characters to the right of the first character that are unequal to the corresponding character in the following key. For example, if the key 12345 (in the index) precedes key 12356 (in the data), the character 5 is eliminated from 12345 because the fourth character in the two keys is the first unequal pair.

The first of the control information fields gives the number of characters eliminated from the front of the key, and the second field gives the number of characters that remain. When the sum of these two numbers is subtracted from the full key length (available from the catalog when the index is opened), the result is the number of characters eliminated from the rear. The third field indicates the control interval that contains a record with the key.

The example in Figure 1 gives a list of full keys and shows the contents of the index entries corresponding to the keys that get into the index (the highest key in each data control interval). A sequence-set record is assumed, with vertical pointers 1 byte long. The index entries shown in the figure from top to bottom are arranged from right to left in the assumed index record.Start of change In Figure 1, the first column (Full Key of Data Record) has all the keys of the data records in the data control intervals (CIs). The second column (Index Entry) shows the contents of the entries in a sequence-set index CI, each entry in the index CI representing one data CI. The compressed keys under Index Entry are generated by front and rear compressions. The front compression is done by comparing the highest key of a current data CI against the highest key of its preceding data CI; the rear compression by comparing a high key with the low key of its next data CI. End of change

Start of change In Figure 1, data CI high key 12345 has no front compression because it is the first key in the index record. Data CI high key 12356 has no rear compression because, in the comparison between 12356 and 12357 (the low key of the next CI), there are no characters following 6, which is the first character that is unequal to the corresponding character in the following key. For high key 12359, comparing it against high key 12356 results in front compression of 1235; comparing with 12370 results in rear compression of 9. End of change

You can always figure out what characters have been eliminated from the front of a key. You cannot figure out the ones eliminated from the rear. Rear compression, in effect, establishes the key in the entry as a boundary value instead of an exact high key. That is, an entry does not give the exact value of the highest key in a control interval, but gives only enough of the key to distinguish it from the lowest key in the next control interval. For example, in Figure 1 the last three index keys are 12401, 124, and 134 after rear compression. Data records with key field between:

  • 12402 and 124FF are associated with index key 124.
  • 12500 and 134FF are associated with index key 134.

If the last data record in a control interval is deleted, and if the control interval does not contain the high key for the control area, then the space is reclaimed as free space. Space reclamation can be suppressed by setting the RPLNOCIR bit, which has an equated value of X'20', at offset 43 into the RPL.

The last index entry in an index level indicates the highest possible key value. The convention for expressing this value is to give none of its characters and indicate that no characters have been eliminated from the front. The last index entry in the last record in the sequence set looks like this:

In a search, the two 0s signify the highest possible key value in this way:
  • The fact that 0 characters have been eliminated from the front implies that the first character in the key is greater than the first character in the preceding key.
  • A length of 0 indicates that no character comparison is required to determine if the search is successful. That is, when a search finds the last index entry, a hit has been made.
Figure 1. Example of Key Compression

Go to the previous page Go to the next page




Copyright IBM Corporation 1990, 2014