Converting between code pages

Archives are often used to move files between UNIX systems. When an archive contains text files, it is frequently the case that the file must be converted from the source system's default code page to the target system's code page. You can do this by using the iconv utility on each file before storing it in an archive or after restoring it from an archive. The pax utility, however, provides an inline code page translation option, -o that can simplify this task. For example:

To convert component files from EBCDIC (IBM-1047) to ASCII (ISO8859-1) when storing them in an archive:
```
pax -o to=iso8859-1 -wzvf /tmp/project.pax.Z  ./
```
To convert component files from ASCII (ISO8859-1) to EBCDIC (IBM-1047) when extracting them from an archive:
```
pax -o from=iso8859-1 -pe -rzvf /tmp/project.pax.Z
```

Note:

The -o option allows both a "from" and a "to" code page to be specified on the same command. If a "from" or "to" codepage is not specified, pax assumes it to be EBCDIC (IBM-1047).
For more information about the code sets supported for this command, see the Coded Character Set Conversion Table in z/OS C/C++ Programming Guide.

Converting archives that contain text and non-text component files. Archives often contain both text and non-text files. Examples of non-text files are image files, such as JPGs and GIFs, and other pax/tar archives. When the -o option is specified, pax converts all files, regardless of type. This corrupts non-text files. The general approach for overcoming this limitation is to run pax two or more times against the same archive, extracting component files in groups of text and non-text types. Whether it is easier to identify (by file name) text files or non-text files will determine how you approach this.

For example, suppose you wish to restore the archive mywebsite.pax, which consists of HTML files (text files) and JPG files (JPEGS, non-text image files) and was created on a system whose default code page is ASCII (ISO8859-1), into the directory /u/website. Assume that the majority of the files are HTML files and that the archived files represent several levels of subdirectories.

First, restore the entire archive using the -o option:

pax -rvf mywebsite.pax  -o to=IBM-1047

This extracts and converts all component files. The extracted non-text JPEG files would be corrupted because they were also converted. The next step would be to re-extract the JPG files without the -o option. The pax option allows you to specify a "pattern" that will be used to extract only those files that match the pattern. However, because of the multiple subdirectories, there is no way to create a pattern that would match every JPG in each subdirectory. Instead, a list of file names to be extracted must first be created and then used as the pattern for the pax command to extract the files. Issuing the following command in the z/OS® shell would accomplish this:

pax -rvf mywebsite  $( pax -f mywebsite.pax | grep -i JPG$ )

This command consists of two parts:

pax -rvf mywebsite  $(    )

and

pax -f mywebsite.pax | grep -i JPG$

The first part is simply the regular pax command for extracting files from an archive. The $( ) expression says to first run the command between the parentheses and substitute the results in place. The second part is the command that generates a list of file names in the archive that end in "JPG" (or any mixed-case variation).

The previous example shows one approach. In general, for any archive, the breakdown of text to non-text files and the uniqueness of the names that identify each type dictate the manner and order in which the files are extracted. For example, we could have reversed the process by first extracting all files without using the -o option, and then re-extracting the HTML files on the second command using the -o option to convert the files