CGI data conversions

The server can perform ASCII to EBCDIC conversions before sending data to CGI programs. This is needed because the Internet is primarily ASCII-based and the IBM® i server is an extended binary-coded decimal interchange code (EBCDIC) server. The server can also perform EBCDIC to ASCII conversions before sending data back to the browser. HTTP and HTML specifications allow you to tag text data with a character set (charset parameter on the Content-Type header). However, this practice is not widely in use today (although technically required for HTTP1.0/1.1 compliance). According to this specification, text data that is not tagged can be assumed to be in the default character set ISO-8859-1 (US-ASCII). The server correlates this character set with ASCII coded character set identifier (CCSID) 819.

National language support HTTP Server CGI directives

You can configure HTTP Server to control which mode is used by specifying the CGIConvMode directive in different contexts, such as server config or directory:

CGIConvMode Mode

Where Mode is one of the following:

  • BINARY
  • EBCDIC
  • EBCDIC_JCD
You can configure HTTP Server to set the ASCII and EBCDIC CCSIDs that are used for conversions by specifying the directives DefaultNetCCSID and CGIJobCCSID in different contexts, such as server config or directory. For example:
  • DefaultNetCCSID 819
  • CGIJobCCSID 37
You can configure HTTP Server to set the locale environment variable by specifying the CGIJobLocale in different contexts, such as server config or directory: CGIJobLocale /QSYS.LIB/EN_US.LOCALE.

CGI input conversion modes

The following table summarizes the type of conversion that is performed by the server for each CGI mode.

Table 1. Conversion action for text in CGI Stdin
CGI_MODE Conversion Stdin encoding Environment variable Query_String encoding argv encoding
BINARY or %%BINARY%% None No conversion CGI job CCSID No conversion No conversion
EBCDIC or %%EBCDIC%% CGI NetCCSID to CGI job CCSID CGI job CCSID CGI job CCSID CGI job CCSID CGI job CCSID
%%EBCDIC%% or %%EBCDIC_JCD%% with charset tag received Calculate target EBCDIC CCSID based on received ASCII charset tag EBCDIC equivalent of received charset CGI job CCSID CGI job CCSID CGI job CCSID
EBCDIC_JCD or %%EBCDIC_JCD%% Detect input based on received data. Convert data to CGI job CCSID Detect ASCII input based on received data. Convert data to CGI job CCSID. CGI job CCSID Detect ASCII input based on received data. Convert data to CGI job CCSID. Detect ASCII input based on received data. Convert data to CGI job CCSID
%%MIXED%% (Compatibility mode) CGI NetCCSID to CGI job CCSID (receive charset tag is ignored) CGI job CCSIDwith ASCII escape sequence CCSID 37 CCSID 37 with ASCII escape sequence CCSID 37 with ASCII escape sequence
Note: If the directive CGIJobCCSID is present, the CGI job runs under its specified CCSID value. Otherwise, the DefaultFsCCSID value is used (the default job CCSID).
BINARY
The BINARY mode, delivers QueryString and stdin to the CGI program in ASCII, exactly as it was received from the client. The environment variables are in the CGI job CCSID. If CGIJobCCSID is present the job CCSID has its value; otherwise, the value associated with DefaultFsCCSID (the default job CCSID) is used.
EBCDIC
The EBCDIC mode, delivers all of the information to the CGI program in the job CCSID. The ASCII CCSID of the QueryString or stdin data is determined from a charset tag on the content type header if present. If CGIJobCCSID is present the job CCSID has its value; otherwise, the value associated with DefaultFsCCSID (the default job CCSID) is used.
EBCDIC_JCD
The EBCDIC_JCD mode is the same as the EBCDIC mode except that a well-known Japanese codepage detection algorithm is used to determine the ASCII CCSID when the charset tag is not present. Japanese browsers can potentially send data in one of three code pages, JIS (ISO-2022-JP), S-JIS (PC-Windows), or EUC (UNIX).

CGI output conversion modes

This following table summarizes the type of conversion that is performed and the charset tag that is returned to the browser by the server.

Table 2. Conversion action and charset tag generation for text in CGI Stdout
CGI Stdout CCSID/Charset in HTTP header Conversion action Server reply charset tag
EBCDIC CCSID/Charset Calculate EBCDIC to ASCII conversion based on supplied EBCDIC CCSID/Charset Calculated ASCII charset
ASCII CCSID/Charset No conversion Stdout CCSID/Charset as Charset
65535 No conversion None
None (CGIConvMode= %%BINARY%%, %%BINARY/MIXED%%, or %%BINARY/EBCDIC%%) Default Conversion - job CCSID to NetCCSID NetCCSID as charset
None (CGIConvMode= BINARY or %%BINARY/BINARY%%) No conversion None
None (CGIConvMode= EBCDIC, %%EBCDIC%%, %%EBCDIC/MIXED%%, or %%EBCDIC/EBCDIC%%) Default Conversion - job CCSID to NetCCSID NetCCSID as charset
None (CGIConvMode= EBCDIC, EBCDIC_JCD, %%EBCDIC%%, %%EBCDIC/MIXED%%, or %%EBCDIC/EBCDIC%% with charset tag received on HTTP request) Use inverse of conversion calculated for stdin Charset as received on HTTP request
None (CGIConvMode= %%EBCDIC_JCD%%, %%EBCDIC_JCD/MIXED%%, or %EBCDIC_JCD/EBCDIC%%) Use inverse of conversion calculated by the Japanese codepage detection ASCII CCSID as charset
None (CGIConvMode= %%MIXED%% or %%MIXED/MIXED%%) Default Conversion - job CCSID to NetCCSID None (compatibility mode)
Invalid CGI error 500 generated by server
BINARY
In this mode HTTP header output is in CCSID 819 with the escape sequences also being the ASCII representative of the ASCII code point. An example of a HTTP header that may contain escape sequences is the Location header. The body is always treated as binary data and the server performs no conversion.
EBCDIC
In this mode HTTP header output is assumed to be in the CGI job CCSID, unless otherwise specified in a charset or CCSID tag by the CGI program. However, the escape sequence must be the EBCDIC representative of the EBCDIC code point for the 2 characters following the ″%″ in the escape sequence. An example of a HTTP header that may contain escape sequences is the Location header. The body (if the mime type is text/*) is assumed to be in the job CCSID, unless otherwise specified in a charset or CCSID tag by the CGI program. If CGIJobCCSID is present the CGI job CCSID has its value; otherwise, the value associated with DefaultFsCCSID (the default job CCSID) is used.
EBCDIC_JCD
In this mode HTTP header output is assumed to be in the job CCSID, unless otherwise specified in a charset or CCSID tag by the CGI program. However, the escape sequence must be the EBCDIC representation of the EBCDIC code point for the 2 characters following the ″%″ in the escape sequence. An example of a HTTP header that may contain escape sequences is the Location header. The body (if the mime type is text/*) is assumed to be in the job CCSID, unless otherwise specified in a charset or CCSID tag by the CGI program. If CGIJobCCSID is present the job CCSID has its value; otherwise, the value associated with DefaultFsCCSID (the default job CCSID) is used.

CGI environment variables

The following CGI environment variables that are related to national language support are set by the HTTP server prior to calling a CGI program:

  • CGI_MODE - which input conversion mode the server is using (%%MIXED%%, %%EBCDIC%%, %%BINARY%%, %%EBCDIC_JCD%%, EBCDIC, BINARY, or EBCDIC_JCD)
  • CGI_ASCII_CCSID - from which ASCII CCSID was used to convert the data
  • CGI_EBCDIC_CCSID - which EBCDIC CCSID the data was converted into
  • CGI_OUTPUT_MODE - which output conversion mode the server is using (%%MIXED%%, %%EBCDIC%%, %%BINARY%, EBCDIC, BINARY, or EBCDIC_JCD)
  • CGI_JOB_LOCALE - which locale to use in the CGI program. This environment variable is set only if the CGIJobLocale directive is set.

For complete list of environment variables set by the HTTP Server, see Environment variables set by HTTP Server.

DBCS considerations

URL-encoded forms containing DBCS data could contain ASCII octets that represent parts of DBCS characters. The server can only convert non-encoded character data. This means that it must un-encode the double-byte character set (DBCS) stdin and QUERY_STRING data before performing the conversion. In addition, it has to reassemble and re-encode the resulting EBCDIC representation before passing it to the CGI program. Because of this extra processing, CGI programs that you write to handle DBCS data may choose to receive the data as BINARY and perform all conversions to streamline the entire process.

Using the EBCDIC_JCD mode: The EBCDIC_JCD mode determines what character set is being used by the browser for a given request. This mode is also used to automatically adjust the ASCII/EBCDIC code conversions used by the web server as the request is processed.

After auto detection, the %%EBCDIC_JCD%% or EBCDIC_JCD mode converts the stdin and QUERY_STRING data from the detected network CCSID into the correct EBCDIC CCSID for Japanese. The default conversions configured for the CGI job are overridden. The DefaultFsCCSID directive or the -fsccsid startup parameter specifies the default conversions. The startup FsCCSID must be a Japanese CCSID. Alternately, the CGIJobCCSID can be set to a Japanese CCSID.

The possible detected network code page is Shift JIS, eucJP, and ISO-2022-JP. The following are the associated CCSIDs for each code page:

Shift JIS
=========
CCSID 932: IBM PC (old JIS sequence, OS/2 J3.X/4.0, IBM Windows J3.1)
CCSID 942: IBM PC (old JIS sequence, OS/2 J3.X/4.0)
CCSID 943: MS Shift JIS (new JIS sequence, OS/2 J4.0
MS Windows J3.1/95/NT)
eucJP
=====
CCSID 5050: Extended UNIX Code (Japanese)
ISO-2022-JP
===========
CCSID 5052: Subset of RFC 1468 ISO-2022-JP (JIS X 0201 Roman and
JIS X 0208-1983) plus JIS X 0201 Katakana.
CCSID 5054: Subset of RFC 1468 ISO-20220JP (ASCII and JIS X 0208-1983)
plus JIS X 0201 Katakana.

The detected network CCSID is available to the CGI program. The CCSID is stored in the CGI_ASCII_CCSID environment variable. When JCD can not detect, the default code conversion is done as configured (between NetCCSID and FsCCSID or CGIJobCCSID).

Since the code page of Stdin and QUERY_STRING are encoded according to the web client's outbound code page, we recommend using the following configuration value combinations when you use the EBCDIC_JCD or %%EBCDIC_JCD%% mode.

Table 3. Recommended CCSID configuration combinations
Startup (FsCCSID)/CGI job CCSID (CGIJobCCSID) Startup (DefaultNetCCSID)/CGI Net CCSID (DefaultNetCCSID) Description
5026/5035 (See note 4) 943 Default: MS Shift JIS
5026/5035 (See note 4) 942 Default IBM PC
5026/5035 (See note 4) 5052/5054 Default ISO-2022-JP

Using CCSID 5050(eucJP) for the startup NetCCSID, is not recommended. When 5050 is specified for the startup NetCCSID, the default code conversion is done between FsCCSID and 5050. This means that if JCD cannot detect a code page, JCD returns 5050 as the default network CCSID. Most browser's use a default outbound code page of Shift JIS or ISO-2022-JP, not eucJP.

If the web client sends a charset tag, JCD gives priority to the charset tag. Stdout function is the same. If the charset/ccsid tag is specified in the Content-Type field, stdout gives priority to charset/ccsid tag. Stdout also ignores the JCD detected network CCSID.

Notes:
  1. If startup NetCCSID is 932 or 942, detected network, Shift JIS's CCSID is the same as startup NetCCSID. Otherwise, Shift JIS's CCSID is 943.
    Startup NetCCSID Shift JIS (JCD detected CCSID)
    ---------------- ------------------------------
    932 932
    942 942
    943 943
    5052 943
    5054 943
    5050 943
  2. Netscape Navigator 3.x sends the alphanumeric characters by using JIS X 0201 Roman escape sequence (CCSID 5052) for ISO-2022-JP. Netscape Communicator 4.x sends the alphanumeric characters by using ASCII escape sequence (CCSID 5054) for ISO-2022-JP.
  3. JCD function has the capability to detect EUC and SBCS Katakana, but it is difficult to detect them. IBM recommends that you do not use SBCS Katakana and EUC in CGI.
  4. CCSID 5026 assigns lowercase alphabet characters on a special code point. This often causes a problem with lowercase alphabet characters. To avoid this problem, do one of the following:
    • Do not use lowercase alphabet literals in CGI programs if the FsCCSID is 5026.
    • Use CCSID 5035 for FsCCSID.
    • Use the Charset/CCSID tag as illustrated in the following excerpt of a CGI program:
      main(){
      printf("Content-Type: text/html; Charset=ISO-2022-JP\n\n");
      ...
      }
    • Do the code conversions in the CGI program. The following sample ILE C program converts the literals into CCSID 930 (the equivalent to CCSID 5026):
      main(){
      printf("Content-Type: text/html\n\n);
      #pragama convert(930)
      printf("<html>");
      printf("This is katakana code page\n");
      #pragama convert(0)
      ...
      }
    • When the web client sends a charset tag, the network CCSID becomes the ASCII CCSID associated with Multipurpose Internet Mail Extensions (MIME) charset header. The charset tag ignores the JCD detected CCSID. When the Charset/CCSID tag is in the Content-Type header generated by the CGI program, the JCD-detected CCSID is ignored by this Charset/CCSID. Stdout will not perform a conversion if the charset is the same as the MIME's charset. Stdout will not perform a conversion if the CCSID is ASCII. Stdout will perform code conversion if the CCSID is EBCDIC. Because the environment variables and stdin are already stored in job CCSID, ensure that you are consistent between the job CCSID and the Content-Type header's CCSID.