Matching records from different data sets

SPLICE can be used for match operations as well as join operations. Suppose you want to look at the courses in SORT.SAMPIN and SORT.SAMPADD and produce three output data sets as follows:

COURSE.MATCH to show the courses that appear in both SORT.SAMPIN and SORT.SAMPADD
COURSE.INONLY to show the courses that appear only in SORT.SAMPIN
COURSE.ADDONLY to show the courses that appear only in SORT.SAMPADD

The following JCL and ICETOOL statements create these three data sets:

//S1    EXEC PGM=ICETOOL
//TOOLMSG  DD SYSOUT=*
//DFSMSG   DD SYSOUT=*
//F1IN     DD DSN=A123456.SORT.SAMPIN,DISP=SHR
//F2IN     DD DSN=A123456.SORT.SAMPADD,DISP=SHR
//T1       DD DSN=&&&T1,DISP=(MOD,PASS),UNIT=SYSDA,SPACE=(CYL,(5,5),RLSE)
//MATCH DD DSN=COURSE.MATCH,DISP=(NEW,CATLG,DELETE),
//  SPACE=(CYL,(5,5)),UNIT=3390
//F1ONLY DD DSN=COURSE.INONLY,DISP=(NEW,CATLG,DELETE),
//  SPACE=(CYL,(5,5)),UNIT=3390
//F2ONLY DD DSN=COURSE.ADDONLY,DISP=(NEW,CATLG,DELETE),
//  SPACE=(CYL,(5,5)),UNIT=3390
//TOOLIN   DD *
* Copy needed File1 fields to T1 and add '11' id
  COPY FROM(F1IN) TO(T1) USING(MATA)
* Copy needed File2 fields to T1 and add '22' id
  COPY FROM(F2IN) TO(T1) USING(MATB)
* Splice second id character from overlay to base for
* matching course numbers.  Use resulting spliced or
* unspliced id to write records to MATCH, F1ONLY
* or F2ONLY as appropriate.
  SPLICE FROM(T1) TO(MATCH) ON(1,5,CH) -
    WITH(32,1) KEEPNODUPS USING(MATC)
/*
//MATACNTL DD *
* Extract needed File1 fields and add '11' id
  OUTREC FIELDS=(1:115,5,     Course number
                 6:120,25,    Course name
                31:C'11')     '11' id for File1 records
/*
//MATBCNTL DD *
* Extract needed File2 fields and add '22' id
  OUTREC FIELDS=(1:115,5,     Course number
                 6:120,25,    Course name
                31:C'22')     '22' id for File2 records
/*
//MATCCNTL DD *
* A '12' id indicates a match between File1 and File2.
* Remove the '12' id and write the records to MATCH.
  OUTFIL FNAMES=MATCH,INCLUDE=(31,2,CH,EQ,C'12'),OUTREC=(1,30)
* A '11' id indicates a record in File1 only.
* Remove the '11' id and write the records to F1ONLY.
  OUTFIL FNAMES=F1ONLY,INCLUDE=(31,2,CH,EQ,C'11'),OUTREC=(1,30)
* A '22' id indicates a record in File2 only.
* Remove the '22' id and write the records to F2ONLY.
  OUTFIL FNAMES=F2ONLY,INCLUDE=(31,2,CH,EQ,C'22'),OUTREC=(1,30)
/*

The first COPY operator writes records to temporary data set T1 consisting of the course number and course name fields from SORT.SAMPIN, and an id of '11', as shown in Table 1.

Table 1. T1 data set fields from SORT.SAMPIN
Course Number	Course Name	'11'
1 5	6 30	31 32

The second COPY operator writes records at the end (MOD) of temporary data set T1 consisting of the course number and course name fields from SORT.SAMPADD, and an id of '22', as shown in Table 2.

Table 2. T1 data set fields from SORT.SAMPADD
Course Number	Course Name	'22'
1 5	6 30	31 32

The SPLICE operator matches the course numbers (ON field). When a match is found, the second id character is spliced into the base record. Because WITHALL is not specified, only the last record for each match is spliced with the first record for each match. KEEPNODUPS tells DFSORT to keep records that are not spliced. Without KEEPNODUPS, records for course numbers that appear once in one data set, but not in the other data set, would be deleted instead of being written to COURSE.INONLY or COURSE.ADDONLY.

Here's what happens for all of the possible types of matches and non-matches:

For a course number that appears one or more times with id '11' and one or more times with id '22', the second '2' from the last '22' record is spliced into the first '11' record to get an id of '12'. Thus, an id of '12' represents a course number that appears in both SORT.SAMPIN and SORT.SAMPADD
For a course number that appears only once with id '11' (and not with id '22') the id of '11' is not changed. For a course number that appears more than once with id '11' (and not with id '22') the second '1' from the last '11' record is spliced into the first '11' record to get an id of '11'. Thus, an id of '11' represents a course number that only appears in SORTIN.SAMPIN.
For a course number that appears only once with an id of '22' (and not with an id of '11') the id of '22' is not changed. For a course number that appears more than once in with an id of '22' (and not with an id of '11') the second '2' from the last '22' record is spliced into the first '22' record to get an id of '22'. Thus, an id of '22' represents a course number that only appears in SORTIN.SAMPADD.

The output created for COURSE.MATCH containing the information for course numbers that appear in both SORT.SAMPIN and SORT.SAMPADD is shown in Table 3.

Table 3. COURSE.MATCH output
Course Number	Course Name
1 5	6 30
00103 00205 10054 30016 30975 50420 50521 50632 70124 70251	DATA MANAGEMENT VIDEO GAMES FICTION WRITING PSYCHOLOGY I PSYCHOANALYSIS WORLD HISTORY WORLD HISTORY EUROPEAN HISTORY ADVANCED MARKETING MARKETING

The output created for COURSE.INONLY containing the information for course numbers that appear only in SORT.SAMPIN is shown in Table 4.

Table 4. COURSE.INONLY output
Course Number	Course Name
1 5	6 30
00032 10347 10856 80521	INTRO TO COMPUTERS TECHNICAL EDITING MODERN POETRY BIOLOGY I

The output created for COURSE.ADDONLY containing the information for course numbers that appear only in SORT.SAMPADD is shown in Table 5.

Table 5. COURSE.ADDONLY output
Course Number	Course Name
1 5	6 30
70255 80522 80523	BUSINESS THEORY BIOLOGY II INTRO TO GENETICS