Converting between code pages
- To convert component files from EBCDIC (IBM-1047) to ASCII (ISO8859-1)
when storing them in an archive:
pax -o to=iso8859-1 -wzvf /tmp/project.pax.Z ./
- To convert component files from ASCII (ISO8859-1) to EBCDIC (IBM-1047)
when extracting them from an archive:
pax -o from=iso8859-1 -pe -rzvf /tmp/project.pax.Z
- The -o option allows both a "from" and a "to" code page to be specified on the same command. If a "from" or "to" codepage is not specified, pax assumes it to be EBCDIC (IBM-1047).
- For more information about the code sets supported for this command, see the Coded Character Set Conversion Table in z/OS C/C++ Programming Guide.
Converting archives that contain text and non-text component files. Archives often contain both text and non-text files. Examples of non-text files are image files, such as JPGs and GIFs, and other pax/tar archives. When the -o option is specified, pax converts all files, regardless of type. This corrupts non-text files. The general approach for overcoming this limitation is to run pax two or more times against the same archive, extracting component files in groups of text and non-text types. Whether it is easier to identify (by file name) text files or non-text files will determine how you approach this.
For example, suppose you wish to restore the archive mywebsite.pax, which consists of HTML files (text files) and JPG files (JPEGS, non-text image files) and was created on a system whose default code page is ASCII (ISO8859-1), into the directory /u/website. Assume that the majority of the files are HTML files and that the archived files represent several levels of subdirectories.
pax -rvf mywebsite.pax -o to=IBM-1047
pax -rvf mywebsite $( pax -f mywebsite.pax | grep -i JPG$ )
This
command consists of two parts: pax -rvf mywebsite $( )
and
pax -f mywebsite.pax | grep -i JPG$
The first
part is simply the regular pax command for
extracting files from an archive. The $( ) expression
says to first run the command between the parentheses and substitute
the results in place. The second part is the command that generates
a list of file names in the archive that end in "JPG" (or any mixed-case
variation).The previous example shows one approach. In general, for any archive, the breakdown of text to non-text files and the uniqueness of the names that identify each type dictate the manner and order in which the files are extracted. For example, we could have reversed the process by first extracting all files without using the -o option, and then re-extracting the HTML files on the second command using the -o option to convert the files