Troubleshooting GenWQE

If you are experiencing problems with the card, follow the instructions here to troubleshoot the issue.

Procedure

  1. Check if the cards are installed and recognized by the system. To see how many GenWQE cards are available in the system, use the lspci command, as shown in this example:
    $ lspci -n | grep -i 044b
    0000:1b:00.0 1200: 1014:044b
    0000:20:00.0 1200: 1014:044b
    If nothing is returned by this command, try reinstalling or reseating the card.
  2. Check if the MMIO BAR has a usable address.
    $ sudo lspci -vs 0000:1b:00.0
    0000:1b:00.0 Class 1200: IBM Device 044b
        Subsystem: IBM Device 044b
        Flags: bus master, fast devsel, latency 0, IRQ 129
        Memory at 38088000000 (64-bit, prefetchable) [size=128M]
        Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [78] Power Management version 3
        Capabilities: [80] Express Endpoint, MSI 00
        Capabilities: [100] Alternative Routing-ID Interpretation (ARI)
        Capabilities: [200] Single Root I/O Virtualization (SR-IOV)
        Capabilities: [300] #19
        Capabilities: [800] Advanced Error Reporting
        Kernel driver in use: genwqe
    If the memory address is 0x0 or if the command doesn't return anything, there might be a hardware configuration issue. Try reconfiguring the PCI slot assignment of your PowerVM® LPAR or PowerKVM guest.
  3. Check if the device driver is loaded.
    1. Run the following command and check its output:
      $ lsmod | grep genwqe
      genwqe_card            88997  0
      crc_itu_t               1910  1 genwqe_card
    2. If the driver is not loaded, try to load it by running the following command:
      # modprobe genwqe_card
    3. If you cannot load the driver, you might need to install the RPM that contains the driver. See Installing the Generic Work Queue Engine.
    4. If the driver is loaded but you are still having problems, try reloading the driver:
      # rmmod genwqe_card
      # modprobe genwqe_card
    5. Check the driver version information as in the following example:
      $ modinfo genwqe_card
      filename:       lib/modules/3.10.0-123.el7.ppc64/extra/genwqe_card-rhel70/genwqe_card.ko
      version:        2.0.21
      description:    GenWQE Card
      srcversion:     D9624C17E73B3E0AC07EC91
      
      alias:          pci:v00001014d0000044Bsv00001014sd0000044Bbc12sc00i00*
      alias:          pci:v00001014d00000000sv00000000sd0000035Fbc12sc00i00*
      alias:          pci:v00001014d0000044Bsv00000000sd0000035Fbc12sc00i00*
      alias:          pci:v00001014d00000000sv00000000sd00000000bc12sc00i00*
      alias:          pci:v00001014d0000044Bsv00000000sd00000000bc12sc00i00*
      alias:          pci:v00001014d0000044Bsv00001014sd0000035Fbc12sc00i00*
      depends:        crc-itu-t
      vermagic:       3.14.3-200.fc20.x86_64 SMP mod_unload
  4. Check the device access and udev rules.
    To use the card as a normal user, the access rights of the device nodes must allow access. Check the device access with the following steps:
    $ ls -l /dev/genwqe*
    crwrwrw 1 root root 249, 0 Jun 30 10:01 /dev/genwqe0_card
    crwrwrw 1 root root 248, 0 Jun 30 10:01 /dev/genwqe1_card
    If the permissions are not correct (something different than crwrwrw), you can run:
    # chmod a+rw /dev/genwqe*
    You can also fix it creating a file named /etc/udev/rules.d/52-genwqedevices.rules with the following content:
    KERNEL=="genwqe*",                      MODE="0666"
    This configuration should be done automatically by the Linux distribution installation or the driver RPM. If you still have problems, try checking the RPM installation.
  5. Use genwqe-tools to check the card health.
    Install the genwqe-tools package, as described in Installing the Generic Work Queue Engine, and run the following command:
    # genwqe_echo [-C <card number>] -c 4
    This command will send echo packets to the card and wait for a response. The command output should be similar to the following:
    # genwqe_echo -c 4
    33 bytes from UNIT #1: echo_req time=37.0 usec
    33 bytes from UNIT #1: echo_req time=19.0 usec
    33 bytes from UNIT #1: echo_req time=23.0 usec
    33 bytes from UNIT #1: echo_req time=18.0 usec
    
    --- UNIT #1 echo statistics ---
    4 packets transmitted, 4 received, 0 lost, 0% packet loss
    If the card does not respond to the echo packets, there might be a problem with the bitstream installed. See Bitstream.
  6. Verify that the environmental variables are set correctly to use the hardware zlib. If the hardware is not found or is not set up properly, the system switches to software zlib.
    To check your environmental variables, run the following command:
    	$ env | grep ZLIB 
    
    If the environmental variables are set up, output similar to the following example is returned:
    	ZLIB_DEFLATE_IMPL=0x1 
    	ZLIB_INFLATE_IMPL=0x1

    If the command does not return any output, set up the environmental variables by running the following export commands:

    	# export ZLIB_INFLATE_IMPL=0x1
    	# export ZLIB_DEFLATE_IMPL=0x1
  7. Verify that you are using the correct version of Java.

    Run the following command:

    # which java
    

    GenWQE uses the IBM® SDK for Java™ 7.1 which installs into the /opt/ibm_java filepath. If you are seeing a different filepath, reinstall Java.

  8. Monitor the DDCB queue of the card.