Overview of the PCIe3 CAPI and PCIe3 FPGA compression accelerator adapters

The PCIe3 Field Programmable Gate Array (FPGA) Compression Accelerator Adapter (EJ12/EJ13; CCIN 59AB) is the first FPGA accelerator adapter that IBM released. It is the predecessor of the PCIe3 Coherent Accelerator Processor Interface (CAPI) Compression Accelerator Adapter (EJ1A/EJ1B; CCIN 2CF0).

The characteristics of the cards are similar. The PCIe3 CAPI Compression Acceleration adapter provides better latency and throughput and significant CPU load reduction.

Compression Accelerator Adapter

It is possible to use both adapters in one system. However, it is not possible to use the two adapters at the same time from one application. You must decide if the application must use the EJ12/EJ13 adapter or the EJ1A/EJ1B adapter. Mixed configuration is not possible. For example, application A uses two EJ12 adapters and application B uses two EJ1A adapters. You cannot have application C using one EJ12 and one EJ1A adapter. Using the same card type is preferred since it allows distributing the load between the adapters automatically.

The accelerated GZIP cards implement the well-defined, open standard DEFLATE compressed data format, which is used in zlib, gzip, Java, and many other applications. Within the gzip and zip file formats, it has become the standard for compressed data exchange.

The high compression bandwidth of the card reduces the latency for a single compression job significantly. Its aggregate throughput allows to keep pace with common I/O traffic.

Thus, the card offers reduced data for storage and network traffic, at the same time having no or even a positive impact on most I/O traffic. It enables good standard compression in cases where software overhead did not allow it so far.

The following areas are examples of typical applications that can benefit from compression acceleration:
  • Storage or transmission of large amounts of data - on an average, larger than 100MB/s
  • Expensive storage with high storage bandwidth, where the compression ratio of the accelerator compared to fast software compression, yields significant savings
  • Applications with a high average throughput of data to be compressed
  • High peak throughput of data that software compression cannot keep up with
  • Where a low latency for individual compression streams is required, and it is more difficult to run in parallel on many CPUs
  • When the standard DEFLATE compression format is required for interchange, as used in GZIP, zlib, zip, or jar. Software compression methods such as LZ4 or LZS with lower compression ratio, but high bandwidth on CPUs is not an option in that case.
Note: The card supports full speed decompression as well, for all compliant compressed input, whether it was compressed by hardware or software.

To achieve the best performance gain, strive for data block sizes that are larger than 64 kB, or buffer up smaller blocks before sending to hardware. The accelerated zlib library has a selectable buffering feature built-in too. For details about slot priorities and placement rules, see PCIe adapter placement and select the system you are working on.

The following are the attributes:
  • High throughput compression, saves storage and I/O bandwidth with little or no overhead
  • CPU offload, CAPI interface with negligible software load, frees up CPU cores for higher value computation or licensed software
  • Lower power consumption by offloading the CPU intensive compression to an FPGA
  • Zlib/gzip standard format, widely used for data interchange
  • Up to 2 GB/sec compression/decompression throughput
  • 4-30x speedup achievable
  • Compression ration near software zlib/gzip

Usage Example

The following example demonstrates the adapter capabilities by showing how you can compress a Linux kernel archive, mostly ASCI text by using the PCIe3 CAPI Compression Acceleration adapter and the predecessor card (GENWQE), compared with a software compressor:
$ time genwqe_gzip -ACAPI -B0 -c linux-3.17.tar > linux-3.17.capi.tar.gz 

real	0m1.409s
user	0m0.032s
sys	0m0.164s

$ time genwqe_gzip -AGENWQE -B0 -c linux-3.17.tar > linux-3.17.genwqe.tar.gz 

real	0m1.425s
user	0m0.024s
sys	0m0.188s

$ time gzip -c linux-3.17.tar > linux-3.17.sw.tar.gz

real	0m17.392s
user	0m16.600s
sys	0m0.112s
The following example is regarding accelerating a version of Secure Copy Protocol (SCP), which happens to link dynamically against libz.so.1. To replace the software libz.so.1 at load time, the example uses the LD_PRELOAD environment variable. Since the PCIe3 CAPI Compression Acceleration adapter is used here, ZLIB_ACCELERATOR is set up accordingly.
$ du -ch linux.tar 
2.3G	linux.tar
2.3G	total

$ time ZLIB_ACCELERATOR=CAPI LD_PRELOAD=/usr/lib/genwqe/libz.so.1 scp -C linux.tar tul3:
linux.tar
100% 2339MB  83.5MB/s   00:28    

real	0m28.097s
user	0m15.832s
sys	0m4.108s

$ time scp -C linux.tar tul3:
linux.tar
100% 2339MB  22.5MB/s   01:44    

real	1m43.848s
user	1m42.100s
sys	0m2.284s
Where: The test setup contains an IBM Tuleta 8247-22L with 20 CPU cores and 8 threads each, avgerage: 3.694 GHz with the PCIe3 CAPI Compression Acceleration adapter and the predecessor hardware accelerator.
Note: Since this is a lab measurement, actual results might differ.