IBM Support

MKS Toolkit Security ID Setting Causes Windows System Lockup

Troubleshooting


Problem

The InfoSphere DataStage parallel engine on Microsoft Windows may 'lock up'. Windows functionality (Explorer, Task Manager, etc) still works. Existing parallel jobs (.OSH files) keep running, but you cannot start or compile other parallel jobs. However, you can start InfoSphere DataStage server jobs. This note explains how to diagnose and fix this issue.

Symptom

Windows cannot create new processes which are linked to the MKS toolkit. These processes include all .OSH file processes, connectivity tools such as SSH, and many other system utilities provided by MKS toolkit.

Cause

Each process created by the InfoSphere DataStage client on a Windows Server requires a Security ID token in the MKS Toolkit. On very busy systems, it is necessary to increase the number of SSIDs above the default.

Environment

Windows

Diagnosing The Problem

If you suspect InfoSphere DataStage is in a hung state, where no additional InfoSphere DataStage parallel jobs will start, you can use the following test technique to verify your suspicions.

  1. Obtain an MKS NuTCRACKER process report dump.
    1. Open a Korn shell by clicking Start->Run->ksh
    2. From the Korn shell enter the following command: process –a –w 5000 > process-log.txt

      The process command reports how many NuTCRACKER platform processes are running and displays information on active NuTCRACKER platform processes.

      Example

      Note: no redirection to a log file is used in this example.

      $ Process -a -w 5000



      There are 5 currently running NuTCRACKER processes
      There are 32 currently allocated process slots
      A total of 2000 processes can run simultaneously
      7 processes have run simultaneously

      Process 4216, program C:\Program Files\MKS Toolkit\bin\rlogind.exe
      ppid is not a NuTCRACKER application
      pgroup 4216
      sess_id 4216
      nice 0

      Process 4552, program C:\Program Files\MKS Toolkit\bin\secshd.exe
      ppid is not a NuTCRACKER application
      pgroup 4552
      sess_id 4552
      nice 0

      Process 29672, program C:\PROGRA~1\MKSTOO~1\mksnt\perl.exe
      ppid is not a NuTCRACKER application
      pgroup 29672
      sess_id 29672
      nice 0

      Process 30936, program C:\PROGRA~1\MKSTOO~1\bin\Process.exe
      ppid is not a NuTCRACKER application
      pgroup 30936
      sess_id 30936
      nice 0
      $

    • Check that you can successfully run a parallel job as follows:


    To help in the testing it is recommended that you first create an environment file in the PXEngine directory. The environment file can be created using an editor such as the Windows Notepad utility. The format of the file is as follows:

      #!/bin/sh

      export APT_ORCHHOME="C:/IBM/InformationServer/Server/PXEngine"
      export APT_CONFIG_FILE="C:/IBM/InformationServer/Server/Configurations/default.apt"
      export PATH="$APT_ORCHHOME/bin;$PATH"
      export APT_PM_SHOWRSH=1

    The above example assumes that InformationServer was installed to the default C: drive. If you installed to a different drive substitute the C: with the appropriate drive letter.

    Once defined, save the file as apt.env in the C:/IBM/InformationServer/Server/PXEngine directory. Note, if you are using Windows Notepad be sure to set the “Save as type” to “All Files” and enter apt.env as the File name. Do not use a .TXT file name extension.

    After creating the environment file, you can test the parallel engine by executing a simple parallel job from the shell or command level as follows:

    The above example assumes that InformationServer was installed to the default C: drive. If you installed to a different drive substitute the C: with the appropriate drive letter.
  2. Open a Korn shell by clicking Start->Run->ksh
  3. Change to the C:/IBM/InformationServer/Server/PXEngine directory by entering

    cd C:/IBM/InformationServer/Server/PXEngine

  4. Source the apt.env environment file by entering

    . ./apt.env

  5. Run a test by entering

    osh “generator –schema record(a:int32) | peek”


    The job should execute normally and you should not experience any hang or error message dialogs. If the following error dialog appears when trying to run the osh test, you will need to increase the number of MKS NuTCRACKER security IDs as described in the following section.


Resolving The Problem

The problem can be avoided by changing a parameter in the MKS Control Panel:
1. Make sure there are no DataStage jobs, uvsh.exe or osh.exe processes running on the server. Any job which uses the NuTCRACKER service will prevent changed settings from taking effect.
2. Using the DataStage control panel applet, stop all the DataStage service.s
3. Open the 'Configure MKS Toolkit' Control panel. Select the 'Manage Services' tab .

Starting with the bottom service shown in the dropdown, Stop each MKS Service.
4. After you stop the last service, press the 'Refresh' button and be sure that the 'Active Processes' box display zero. If necessary, stop any remaining services. Unless the Active Process count is zero, the setting changes will not take effect.
5. a. Select the "Runtime Settings" tab
b. Select 'Miscellaneous Settings' from the Category dropdown.
6. Change the value in the 'Max Number of Security ID.' box to 5000.
8. Select the 'ManageServices' tab again and restart the services in the order shown in the dropdown, starting at the top.
NOTE: When the system reboots, the MKS services and the DataStage services will all restart. So, if you plan to reboot the Windows system, you can skip restarting the services and hit OK to exit the MKS control panel.
9. Hit OK to exit the MKS control panel.
10. Restart the DataStage Services using the DataStage control panel.

[{"Product":{"code":"SSVSEF","label":"IBM InfoSphere DataStage"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"Not Applicable","Platform":[{"code":"PF033","label":"Windows"}],"Version":"8.7;8.5;8.1","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
16 June 2018

UID

swg21615249