A DataStage job takes a long time with startup time in Information Server. How can I shorten the startup time?
Diagnosing the problem
Look for the following entry, which shows the startup time, in the job log:
Message: main_program: Startup time, 2:51; production run time, 0:00.
Keep track of the startup time over several job runs. If you see some runs with significantly longer startup times you may be running into this issue. The job we monitored in this case usually reports only a few seconds for the startup time. However, the startup time shown above is 2 minutes and 51 seconds, which is much longer than normal.
Resolving the problem
Please set the following user defined environment variable at the project level through DataStage Administrator to shorten the starup time:
When the environment variable APT_DATASET_FLUSH_NOSYNC is set, it turns off calls to sync(), but fsync() is still called for synchronization of I/O. The sync() system call is used to flush all dirty buffers to the disk by the process where as fsync() is used to flush all data blocks that belongs to a specific open file to the disk. sync() calls a series of functions, that involve a wait. In the case of descriptor files, since we have the information about the files opened by the process, fsync() is sufficient.
There is another user defined environment variable called APT_DATASET_FLUSH_NOFSYNC. This was added to fix performance issues faced in cluster environments. It was found that jobs using datasets became very slow in cluster environments and worked fine in a non-cluster setup. This can also be set in DataStage Administrator:
If both of these environment variables are set the buffers still get flushed to disk, but not immediately. In general the Kernel does flush dirty buffers to disk in due time, though it immediately returns back keeping the dirty buffers with it while returning to the process. However, in case of power failure or any sudden disaster, there is a possibility of losing data, which is avoided in general using sync() and fsync() system calls.
These environment variables are turned on as needed when a performance loss is seen.