comm — Show and select or reject lines common to two files

Format

comm [–B123] [-W option[,option]...] file1 file2

Description

comm locates identical lines within files sorted in the same collating sequence, and produces three columns; the first contains lines found only in the first file, the second lines only in the second file, and the third lines that are in both files. If you specify - in place of either file1 or file2, comm reads from the standard input (stdin).

Options

–1
Suppresses lines that appear only in file1
–2
Suppresses lines that appear only in file2
–3
Suppresses lines that appear both in file1 and file2
–B
Disables the automatic conversion of tagged files. This option is ignored if the filecodeset or pgmcodeset options (-W option) are specified.
-W option[,option]...
Specifies z/OS-specific options. The option keywords are case-sensitive. Possible options are:
filecodeset=codeset
Performs text conversion from one code set to another when reading from the file. The coded character set of the file is codeset. codeset can be a code set name known to the system or a numeric coded character set identifier (CCSID). Note that the command iconv -l lists existing CCSIDs along with their corresponding code set names. The filecodeset and pgmcodeset options can be used on files with any file tag.

If pgmcodeset is specified but filecodeset is omitted, then the default file code set is ISO8859-1 even if the file is tagged with a different code set. If neither filecodeset nor pgmcodeset is specified, text conversion will not occur unless automatic conversion is enabled or the _TEXT_CONV environment variable indicates text conversion. For more information about text conversion, see Controlling text conversion for z/OS UNIX shell commands.

If filecodeset or pgmcodeset is specified, then automatic conversion is disabled for this command invocation and the -B option is ignored if it is also specified. See z/OS UNIX System Services Planning for more information about automatic conversion.

When specifying values for filecodeset, use the values that Unicode Service supports. For more information about supported code sets, see z/OS Unicode Services User's Guide and Reference.

pgmcodeset=codeset
Performs text conversion from one code set to another when reading from the file. The coded character set of the program (command) is codeset. codeset can be a code set name known to the system or a numeric coded character set identifier (CCSID). Note that the command iconv -l lists existing CCSIDs along with their corresponding code set names. The filecodeset and pgmcodeset options can be used on files with any file tag.

If filecodeset is specified but pgmcodeset is omitted, then the default program code set is IBM-1047. If neither filecodeset nor pgmcodeset is specified, text conversion will not occur unless automatic conversion is enabled or the _TEXT_CONV environment variable indicates text conversion. For more information about text conversion, see Controlling text conversion for z/OS UNIX shell commands.

If filecodeset or pgmcodeset is specified, then automatic conversion is disabled for this command invocation and the -B option is ignored if it is also specified. See z/OS UNIX System Services Planning for more information about automatic conversion.

Restriction: The only supported values for pgmcodeset are IBM-1047 and 1047.

The options suppress individual columns. Thus, to list only the lines common to both files, use:
comm -12
To find lines unique to one file or the other, use:
comm -3

Observe that comm -123 displays nothing.

Examples

  1. To display the lines that are unique to each text file and the lines that are common to both text files:
    comm myFile01 myFile02
  2. To display the lines that are unique to a text file containing UTF-8 characters, assuming that
    • The text files are untagged and you do not want to tag them or enable automatic conversion, and
    • You cannot alter the tag (for example, you are comparing untagged public text files or read-only text files)
    then issue:
    comm -23 -W filecodeset=UTF-8,pgmcodeset=IBM-1047 myUtf8File01 myUtf8File02
  3. To display the lines that are common to both text files containing EBCDIC characters, assuming that automatic conversion has been enabled but the files are incorrectly tagged as ASCII:
    comm -12 -B myMisTaggedFile01 myMisTaggedFile02

Localization

comm uses the following localization environment variables:
  • LANG
  • LC_ALL
  • LC_COLLATE
  • LC_CTYPE
  • LC_MESSAGES
  • NLSPATH

See Localization for more information.

Environment variables

comm uses the following environment variable:
_TEXT_CONV
Contains text conversion information for the command. The text conversion information is not used when either the -B option or the filecodeset or pgmcodeset option (-W option) is specified. For more information about text conversion, see Controlling text conversion for z/OS UNIX shell commands.

Exit values

0
Successful completion
1
Failure due to any of the following:
  • The code set is not valid
  • Could not turn off automatic conversion
  • Could not perform requested text conversion
2
Failure that generated a usage message, such as naming only one input file

Portability

POSIX.2, X/Open Portability Guide, UNIX systems.

The –B and -W options are extensions of the POSIX standard.

Related information

cmp, diff, sort, uniq