IBM Support

TWSz Customization Best Practices

Question & Answer


Question

What are the suggested best practices for configuring/tuning TWS z/OS (COMPID 5697WSZ01 )

Cause

customization

Answer

****Revision date November 3, 2014 ****

Tivoli Workload Scheduler for z/OS is a product with many files, tasks and parameters. The manuals (particularly Planning and Installation, and Customization and Tuning) and the sample members supplied via SEQQSAMPand tailored via the EQQJOBS CLIST will produce a working system but one that still needs specific tailoring depending on the desired features to be implemented and the relative size of the installation (in terms of number of jobs executed per day and the number of non-MVS agents) .

This technote will be structured as a "TOP TEN" list, broken into two main sections- customization
for all customers, and customization for optional features that may not be deployed by all customers.

For each item, a few hints/tips will be given and if more detail is needed a pointer will be
made to a separate technote or to documentation in a particular TWSz manual.

NOTE: This technote will be updated frequently as additional information becomes available or new items are added- the REVISION DATE at the top of the technote will be modified and new items will be marked with a vertical bar ( | ) .

The "TOP TEN" list to be covered will include:

ITEMS OF INTEREST for ALL TWSz configurations:

(1) Performance considerations (controller, tracker, dialog use, batch jobs)
(2) TWSz file sizes
(3) start up/ shut down issues
(4) issues with messages
(5) Files for debugging purposes
(6) Daily planning jobs

ITEMS OF INTEREST for SOME TWSz configurations:

(7) Performance configuration for datastore, server, output collector
(8) High availability configuration
(9) Configuration parameters for optional features
(10) Security (optional)

Note: For issues related to a migration to a different TWSz release see technote 1608030 (see link below)


ITEMS OF INTEREST for ALL TWSz configurations:

(1) Performance considerations for controller, tracker, dialog use, batch

(a) parameters

CONTROLLER: see technote 1247273 for ways to prevent high CPU usage by the controller.

(i) CPDTLIM, CPBPLIM - these OPCOPTS parameters can be used to get more efficient LSR BUFFERING of the CURRENT PLAN (CP) -- see APAR PK37805 for additional information

The correct setting of the CPDTLIM/CPBPLIM parameters is a little
tricky-- first, based on the REGION size you are now using for the
controller and the EQQN018I message which indicates how may LSR
buffers are requested and used:
.
for example:
EQQN018I VSAM LSR BUFFERS HAVE BEEN SUCCESSFULLY ALLOCATED FOR VSAM FILE
EQQCP2DS
EQQN018I NUMBER OF INDEX BUFFERS ARE 000006 WITH SIZE 000512
EQQN018I NUMBER OF DATA BUFFERS ARE 000010 WITH SIZE 032768 (REQUESTED
000010)
.
You then check a LISTCAT of CP1/CP2 to determine the total number
of 32K buffers that would be needed to hold all the data:
.
Take the HI-U-RBA value in the DATA portion, divide by 32768.
This gives you the target number of buffers needed to have CPDTLIM(100).
.
Next, subtract the number of buffers already shown in the EQQN018
message for CP1 or CP2 from the value obtained from the HI-U-RBA of
the LISTCAT. This is how much additional memory you would need in the
controller REGION to hold all the additional buffers (multiply the
number of buffers by 32768, then divide by 102400 to get an
approximation of how many meg to add to the controller region size.
.
The CPBPLIM parameter can be used to prevent in extreme cases almost
all of the controller REGION from being used for CP buffers.
For example CPBPLIM(75) ensures that 25% of the REGION is NOT used
by CP buffers.

(ii) BACKUP and MAXJSFILE - these JTOPTS parameters determine the frequency of CP backups and JS backups.

JTOPTS BACKUP
The TWSz CONTROLLER automatic backup of the CURRENT PLAN dataset, as indicated
in the CONTROLLER EQQMLOG by message EQQN051I with text TRIGGER WAS: BACKUP
LIM) is performed based on the number of "events" processed by the TWSz
CONTROLLER. The frequency of this backup/copy, is controlled by the BACKUP()
keyword of the JTOPTS initialization statement which should be adjusted by the
user so that the backup occurs roughly once per hour during the busiest time of
the TWSz production schedule.
More frequent backups unnecessarily increase CONTROLLER overhead, and may
reduce job-scheduling throughput and since the backup process holds a lock on
the CURRENT PLAN, prevents access to the plan by all other TWSZ subtasks while
the copy is in progress.
On the other hand, if CP backups are taken too infrequently, there will be an
excessive delay while the CONTROLLER does recovery processing when restarted
after a system outage, after being CANCELLED (so it is unable to do normal
shutdown processing), or following an ABEND.
Users should monitor the frequency of MSG EQQN051I, with text TRIGGER WAS:
BACKUP LIM) in the CONTROLLER EQQMLOG, and adjust the JTOPTS BACKUP() keyword
parameter so that the message is issued at the desired interval.
For instance, if you find the BACKUP LIM message issued every five minutes, and
your JTOPTS has BACKUP(400), increasing this to BACKUP(4000) would change the
interval (assuming the same workload) to 50 minutes.
Since workload is subject to increase over time, the EQQMLOG should be checked
periodically and the parameter adjusted as needed.

Another way to handle this situation is to set JTOPTS BACKUP(NO) or to specify
a very large value so that the automatic backup does not occur at all, then
create a TWSz application to run a job ever hour to invoke EQQEVPGM with sysin
"BACKUP RESDS(CP) SUBSYS(tracker)" Where "tracker" is the TWSz TRACKER started
task/subsystem on the LPAR where the job executes. This will cause the TRACKER
to create a special event record requesting the CONTROLLER to perform a CP
BACKUP. In this manner, the frequency of CURRENT PLAN backups can be
controlled independently or workload, and without adjusting the JTOPTS BACKUP()
parameter.

MAXJSFILE

The frequency of the CONTROLLER JSFILE BACKUPs is controlled by the JTOPTS
MAXJSFILE() keyword. Check the CUSTOMIZATION AND TUNING manual for
documentation on this initialization parameter.

MAXJSFILE can have one of three different values.

0 -- means do a backup whenever the active JSFILE becomes two cylinders larger
than it was when the CONTROLLER was started.

NO - means disable the automatic JSFILE backup process and do the BACKUP/RE-ORG
processing **ONLY** ON COMMAND (When an EQQEVPGM job is run with sysin BACKUP
RESDS(JS)). Check APPENDIX A of the PLANNING AND SCHEDULING THE WORKLOAD
manual for info on the various invocations of
EQQEVPGM.

nnnnn - where nnnnn is a number which is multiplied by 1,000,000 to find the
maximum HIGH USED RBA of the active JSFile (in bytes). The CONTROLLER submits
50 jobs, then checks the size of the active file. If it exceeds the MAXJSFILE
value, a backup is done. Then another 50 jobs are submitted and the filesize
is checked again. If the REORG/COPY does not bring the size of the active
JSFILE below the specified value, you will end up doing the REORG/COPY after
every 50 jobs.

Most customers elect to SCHEDULE an EQQEVPGM BACKUP job once or twice per day
during the times when OPC/TWSz is LEAST BUSY, and to set MAXJSFILE() either to
a very large number, or set it to NO.

NOTES:
If MAXJSFILE(NO) is set, then a BACKUP job *MUST* be explicitly scheduled.

As documented in the manual, in the SUBSYS parameter in the SYSIN for this
job must be the TRACKER on the LPAR where the job executes.

A "rule of thumb" is to have a CP switch done at MOST once per hour during the busiest
time of the day for controller processing (peak batch in most cases)

Note that at TWSz 8.6 and above, MAXJSFILE is mebabytes (MB) rather than kilobytes (KB) as
was the case in previous releases.

(iii) ETTGENSEARCH (JTOPTS)

When a potential ETT TRIGGER is received by the CONTROLLER, it is first checked
for an EXACT match against the ETT table. This processing is VERY fast and
efficient.
But if an exact match cannot be made and GENERIC matching is enabled, then the
table is searched a second time for a generic match. This is very CPU
intensive, and if the ETT table is large, or if the number of potential
triggers is great, there may be a noticeable impact on TWSz performance.
So in environments where there is a large component of non-TWSz work, it may be
useful to set ETTGENSEARCH(NO) in the JTOPTS init statement.
If ETT generic trigger names are validly used, an alternative is to use the
TRACKER EVENT FILTERING USER EXIT, EQQUX004, to prevent creation of job
tracking events for jobs which are of no interest to TWSz. Of course, this is
possible only if either the jobs TWSZ *does* or *does not* care about can be
identified by jobname.
EVERY SPECIAL RESOURCE STATUS CHANGE EVENT (event type SY) is always a
potential ETT TRIGGER, regardless of whether the resource is defined to TWSz,
or its current status. But generic ETT matching of Special Resource events is
not normally a problem unless some application (such as file transfer program
NDM) has an active exit which issues an SRSTAT for EVERY DATASET processed.
If S/R generic ETT processing does prove excessive, and dataset naming
standards allow, EQQUX004 can be used to filter the SY events as well

(iv) LISTLOGGING (AUTHDEF) - LISTLOGGING(FIRST) is recommended

(v) SUBRESOURCES (AUTHDEF) - Activate only those SUBRESOURCES you actually create security profiles for

(vi) AUDIT statement - recommendation is to NEVER use ACCESS(READ), and only use AMOUNT(DATA) only if the information is very important to you (or your auditors)

| (vii) BUILDSSX and SSCMNAME (OPCOPTS) - see also technote 1265106 . Using
| BUILDSSX(REBUILD) for a tracker may cause tracking records to be lost which means job
| will not track correctly. The REBUILD option is intended only for TWSz migrations and as an
| ACTION HOLD for some PTFs. After the tracker is started once with the BUILDSSX(REBUILD)
| parameter, the parameter should be removed to prevent loss of job tracking records. If SSCMNAME | parameter is also specified, the PERMANENT option is recommended so that the tracker does
| not revert back to an older level when it is shut down.

| The proper shutdown procedure for avoiding loss of tracking records is discussed in section "How to
| make sure that events are not lost" in the Managing the Workload manual, chapter "Overview of job
| tracking on z/OS" (in the INDEX of this manual look for "losing events at shutdown" for a pointer to
| this section).


(b) z/OS related configuration

(i) - use ZFS rather than HFS for E2E work directory (WRKDIR) and BINDIR

| (ii) - For restart and cleanup (datastore) -- to avoid a CLNP Error on non-cataloged tape,
| apply HIPER z/OS APAR OA45876 for z/OS

| (iii) Since recovery of the current plan requires the NCP dataset and the JT files (JTARC plus
| JT01-JTxx be extremely careful not to compromise these files- for example by restoring an earlier
| version of them. If these files are not available in the correct state for recovery purpose, it may not be
| possible to recovery the current plan .

(2) TWSz file sizes- database files, JT files, event datasets etc

The default size in samples (EQQPCS01, EQQPCS02, etc ) are often to small for most users.

The biggest issues causing problems with data set sizing are
1) JTARC overflow
2) JSFile MAXRECL
3) EQQADDS MAXRECL (AD and JCL VARTAB maximum sizes)
4) VSAM SHAREOPTIONs - see techNote 1198958 (see link below)


(3) start up/ shut down issues -
(a) start up of TWSz -- recommendation is to start TWSz after JES2, TCPIP and OMVS


(4) issues with messages
(a) automation of messages including edit of SEQQMSG0 members-- see chapter 9 of Customization and Tuning manual

(b) use of DIAGNOSE statement to produce useful messages -- see technote 1104886 (see link below)

| (c) An "E" level message with text ending with "IS NOT DEFINED" for example
| EQQE129E MESSAGE IS NOT DEFINED usually indicates that a PTF which added a message
| was installed, but the SEQQMSG0 member containing the new message definition was NOT
| copied into the running TWSz environment along with the load module(s) in the SEQQLMD0
| library. Any time PTFs are applied to TWSz, the contents of all the SEQQ* datasets should be
| copied into the executable TWSz instance.



(5) Files for debugging purposes - keeping EQQMLOG, EQQTROUT information

(i) Output of daily planning jobs
(ii) , use of EQQDUMP
(iii) how to capture dumps when needed,
avoiding products which suppress dumps or do not produce SYSMDUMPS
(iv) tracklog (EQQTROUT) see technote 1226811 (see link below)
(v) EQQAUDIT - see technotes 1050087 and 1608054 (see links below)


(6) Daily planning jobs - LT, CP, trial plan etc .

(i) REPLAN - pros/cons of running REPLAN

Only cons to REPLAN: 1) Completed work is removed / becomes invisible
2) EQQTROUT contents (see technote 1226811 (see link below)

| (ii) To cause the current plan to be extended to a specific time of day (rather than for a number of hours)
| Technote 1252805 shows how to do this with JCL variable substitution- the
| example shows an extend to 0800 but you can easily change this to
| any time that you want (see link for technote below).




ITEMS OF INTEREST for SOME TWSz configurations:

(7) Performance configuration for datastore, server, output collector

(i) E2E server -
Check if SMF type 92 records are being collected - this will cause high CPU
overhead for the E2E server task.

SMF type 92 records are not required or even recommended for TWS. These records
deal with RACF HFS record of access to an HFS by a userid.

The USS Planning guide recommends that type92 records *not* be collected
(unless there is have a specific need for them). See manual SA22-7684-03,
chapter 2, section 2.3 "Auditing forz /OS UNIX system services".

Also check APAR PK23181 - running E2E with the WRKDIR (work directory )
mounted on a different LPAR from the one where the E2E server task is running can cause excessive CPU use, due to XCF being needed to do I/O to the WRKDIR.

Customization of messages (to suppress or write as WTO) is done via the TWSCCLog.properties file for E2E server

| If the USS WRKDIR is filling up (this can cause message EQQPH07E THE SERVER STARTER
| PROCESS ABENDED)- avoid the use of this DIAGNOSE statement::
| DIAGNOSE TPLGYFLAGS(X'10000000')
| which will cause huge amounts of data to be written to E2EMERGE.
| See technote 1104886 (see link below) for a general discussion of DIAGNOSE statement.
| Reducing the value for TRCDAYS in the TOPOLOGY statement of the E2E server task is a quick way
| to reduce the size of the WRKDIR (since this will eliminate some log files from being kept.

| (ii) Datastore (restart and cleanup)

| The RCLPASS and DUMMYLASTSTEP parameters are important and documented in:
| (see links below)

| (a) Technote 1670125
| (b) DOC APAR PM60019:
| (c) INFO APAR II14492:


| To summarize all this documentation-- unless you are using
| EWTROPTS parameter RETCODE(LAST), your best bet is to
| code DUMMYLASTSTEP() and allow RCLPASS to default to NO.

| If you need to have RETCODE(LAST), then RCLPASS(YES) should
| be coded instead of DUMMYLASTSTEP()

| Both these parameters have unfortunate side effects.
I| f you code RCLPASS(YES) then ALL your TEMPORARY DATASETS
| will be permanently KEPT instead of being deleted at the
| end of the job as you would expect- you would then need to
| have some kind of clean up routine to delete these "temporary"
| datasets to avoid wasting disk space.

| Coding DUMMYLASTSTEP() will avoid having temporary datasets
| kept, but adds the extra IBM50941 step at the end of EVERY
| job submitted by TWSz.

| Ignoring the warning message and coding neither RCLPASS(YES)
| or DUMMYLASTSTEP() exposes you to having EQQCLEAN delete the
| wrong dataset if a job has a DISP=PASS dataset in the last
| step of the job. The JCL for the SAS product is a .
| common example of this.

| To avoid a CLNP Error on non-cataloged tape,
| apply HIPER z/OS APAR OA45876 for z/OS

| HIPER and recommended maintenance for all users of restart and cleanup/datastore function:
| PM8261 PI10986 PI13014



(8) High availability config- standby controller, DVIPA, DBRLM , REGION size, CP buffering, mirroring

(9) Configuration parameters for optional features
(a) restart and cleanup
(b) dynamic critical path
(c) zcentric and FTA
(d) shadow jobs
(e) conditional and step level dependencies

(10) Security issues
(a) CLASS (AUTHDEF) -

The RACF predefined class name for TWS/ZOS is "IBMOPC". The IBMOPC class should
be used in preference to any other user-defined class. For further
information, refer to DOC APAR PK04155.
(b) SSL
(c) datastore and output collector
(d) OMVS security for E2E

All see technotes 1194317 (see link below)

[{"Product":{"code":"SSRULV","label":"IBM Workload Scheduler for z\/OS"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"--","Platform":[{"code":"PF035","label":"z\/OS"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB35","label":"Mainframe SW"}}]

Document Information

Modified date:
13 September 2019

UID

swg21673029