CGI data conversions
The server can perform ASCII to EBCDIC conversions before sending data to CGI programs. This is needed because the Internet is primarily ASCII-based and the IBM® i server is an extended binary-coded decimal interchange code (EBCDIC) server. The server can also perform EBCDIC to ASCII conversions before sending data back to the browser. HTTP and HTML specifications allow you to tag text data with a character set (charset parameter on the Content-Type header). However, this practice is not widely in use today (although technically required for HTTP1.0/1.1 compliance). According to this specification, text data that is not tagged can be assumed to be in the default character set ISO-8859-1 (US-ASCII). The server correlates this character set with ASCII coded character set identifier (CCSID) 819.
National language support HTTP Server CGI directives
You can configure HTTP Server to control which mode is used by specifying the CGIConvMode directive in different contexts, such as server config or directory:
CGIConvMode Mode
Where Mode is one of the following:
- BINARY
- EBCDIC
- EBCDIC_JCD
- DefaultNetCCSID 819
- CGIJobCCSID 37
CGI input conversion modes
The following table summarizes the type of conversion that is performed by the server for each CGI mode.
CGI_MODE | Conversion | Stdin encoding | Environment variable | Query_String encoding | argv encoding |
---|---|---|---|---|---|
BINARY or %%BINARY%% | None | No conversion | CGI job CCSID | No conversion | No conversion |
EBCDIC or %%EBCDIC%% | CGI NetCCSID to CGI job CCSID | CGI job CCSID | CGI job CCSID | CGI job CCSID | CGI job CCSID |
%%EBCDIC%% or %%EBCDIC_JCD%% with charset tag received | Calculate target EBCDIC CCSID based on received ASCII charset tag | EBCDIC equivalent of received charset | CGI job CCSID | CGI job CCSID | CGI job CCSID |
EBCDIC_JCD or %%EBCDIC_JCD%% | Detect input based on received data. Convert data to CGI job CCSID | Detect ASCII input based on received data. Convert data to CGI job CCSID. | CGI job CCSID | Detect ASCII input based on received data. Convert data to CGI job CCSID. | Detect ASCII input based on received data. Convert data to CGI job CCSID |
%%MIXED%% (Compatibility mode) | CGI NetCCSID to CGI job CCSID (receive charset tag is ignored) | CGI job CCSIDwith ASCII escape sequence | CCSID 37 | CCSID 37 with ASCII escape sequence | CCSID 37 with ASCII escape sequence |
- BINARY
- The BINARY mode, delivers QueryString and stdin to the CGI program in ASCII, exactly as it was received from the client. The environment variables are in the CGI job CCSID. If CGIJobCCSID is present the job CCSID has its value; otherwise, the value associated with DefaultFsCCSID (the default job CCSID) is used.
- EBCDIC
- The EBCDIC mode, delivers all of the information to the CGI program in the job CCSID. The ASCII CCSID of the QueryString or stdin data is determined from a charset tag on the content type header if present. If CGIJobCCSID is present the job CCSID has its value; otherwise, the value associated with DefaultFsCCSID (the default job CCSID) is used.
- EBCDIC_JCD
- The EBCDIC_JCD mode is the same as the EBCDIC mode except that a well-known Japanese codepage detection algorithm is used to determine the ASCII CCSID when the charset tag is not present. Japanese browsers can potentially send data in one of three code pages, JIS (ISO-2022-JP), S-JIS (PC-Windows), or EUC (UNIX).
CGI output conversion modes
This following table summarizes the type of conversion that is performed and the charset tag that is returned to the browser by the server.
CGI Stdout CCSID/Charset in HTTP header | Conversion action | Server reply charset tag |
---|---|---|
EBCDIC CCSID/Charset | Calculate EBCDIC to ASCII conversion based on supplied EBCDIC CCSID/Charset | Calculated ASCII charset |
ASCII CCSID/Charset | No conversion | Stdout CCSID/Charset as Charset |
65535 | No conversion | None |
None (CGIConvMode= %%BINARY%%, %%BINARY/MIXED%%, or %%BINARY/EBCDIC%%) | Default Conversion - job CCSID to NetCCSID | NetCCSID as charset |
None (CGIConvMode= BINARY or %%BINARY/BINARY%%) | No conversion | None |
None (CGIConvMode= EBCDIC, %%EBCDIC%%, %%EBCDIC/MIXED%%, or %%EBCDIC/EBCDIC%%) | Default Conversion - job CCSID to NetCCSID | NetCCSID as charset |
None (CGIConvMode= EBCDIC, EBCDIC_JCD, %%EBCDIC%%, %%EBCDIC/MIXED%%, or %%EBCDIC/EBCDIC%% with charset tag received on HTTP request) | Use inverse of conversion calculated for stdin | Charset as received on HTTP request |
None (CGIConvMode= %%EBCDIC_JCD%%, %%EBCDIC_JCD/MIXED%%, or %EBCDIC_JCD/EBCDIC%%) | Use inverse of conversion calculated by the Japanese codepage detection | ASCII CCSID as charset |
None (CGIConvMode= %%MIXED%% or %%MIXED/MIXED%%) | Default Conversion - job CCSID to NetCCSID | None (compatibility mode) |
Invalid | CGI error 500 generated by server |
- BINARY
- In this mode HTTP header output is in CCSID 819 with the escape sequences also being the ASCII representative of the ASCII code point. An example of a HTTP header that may contain escape sequences is the Location header. The body is always treated as binary data and the server performs no conversion.
- EBCDIC
- In this mode HTTP header output is assumed to be in the CGI job CCSID, unless otherwise specified in a charset or CCSID tag by the CGI program. However, the escape sequence must be the EBCDIC representative of the EBCDIC code point for the 2 characters following the ″%″ in the escape sequence. An example of a HTTP header that may contain escape sequences is the Location header. The body (if the mime type is text/*) is assumed to be in the job CCSID, unless otherwise specified in a charset or CCSID tag by the CGI program. If CGIJobCCSID is present the CGI job CCSID has its value; otherwise, the value associated with DefaultFsCCSID (the default job CCSID) is used.
- EBCDIC_JCD
- In this mode HTTP header output is assumed to be in the job CCSID, unless otherwise specified in a charset or CCSID tag by the CGI program. However, the escape sequence must be the EBCDIC representation of the EBCDIC code point for the 2 characters following the ″%″ in the escape sequence. An example of a HTTP header that may contain escape sequences is the Location header. The body (if the mime type is text/*) is assumed to be in the job CCSID, unless otherwise specified in a charset or CCSID tag by the CGI program. If CGIJobCCSID is present the job CCSID has its value; otherwise, the value associated with DefaultFsCCSID (the default job CCSID) is used.
CGI environment variables
The following CGI environment variables that are related to national language support are set by the HTTP server prior to calling a CGI program:
- CGI_MODE - which input conversion mode the server is using (%%MIXED%%, %%EBCDIC%%, %%BINARY%%, %%EBCDIC_JCD%%, EBCDIC, BINARY, or EBCDIC_JCD)
- CGI_ASCII_CCSID - from which ASCII CCSID was used to convert the data
- CGI_EBCDIC_CCSID - which EBCDIC CCSID the data was converted into
- CGI_OUTPUT_MODE - which output conversion mode the server is using (%%MIXED%%, %%EBCDIC%%, %%BINARY%, EBCDIC, BINARY, or EBCDIC_JCD)
- CGI_JOB_LOCALE - which locale to use in the CGI program. This environment variable is set only if the CGIJobLocale directive is set.
For complete list of environment variables set by the HTTP Server, see Environment variables set by HTTP Server.
DBCS considerations
URL-encoded forms containing DBCS data could contain ASCII octets that represent parts of DBCS characters. The server can only convert non-encoded character data. This means that it must un-encode the double-byte character set (DBCS) stdin and QUERY_STRING data before performing the conversion. In addition, it has to reassemble and re-encode the resulting EBCDIC representation before passing it to the CGI program. Because of this extra processing, CGI programs that you write to handle DBCS data may choose to receive the data as BINARY and perform all conversions to streamline the entire process.
Using the EBCDIC_JCD mode: The EBCDIC_JCD mode determines what character set is being used by the browser for a given request. This mode is also used to automatically adjust the ASCII/EBCDIC code conversions used by the web server as the request is processed.
After auto detection, the %%EBCDIC_JCD%% or EBCDIC_JCD mode converts the stdin and QUERY_STRING data from the detected network CCSID into the correct EBCDIC CCSID for Japanese. The default conversions configured for the CGI job are overridden. The DefaultFsCCSID directive or the -fsccsid startup parameter specifies the default conversions. The startup FsCCSID must be a Japanese CCSID. Alternately, the CGIJobCCSID can be set to a Japanese CCSID.
The possible detected network code page is Shift JIS, eucJP, and ISO-2022-JP. The following are the associated CCSIDs for each code page:
Shift JIS
=========
CCSID 932: IBM PC (old JIS sequence, OS/2 J3.X/4.0, IBM Windows J3.1)
CCSID 942: IBM PC (old JIS sequence, OS/2 J3.X/4.0)
CCSID 943: MS Shift JIS (new JIS sequence, OS/2 J4.0
MS Windows J3.1/95/NT)
eucJP
=====
CCSID 5050: Extended UNIX Code (Japanese)
ISO-2022-JP
===========
CCSID 5052: Subset of RFC 1468 ISO-2022-JP (JIS X 0201 Roman and
JIS X 0208-1983) plus JIS X 0201 Katakana.
CCSID 5054: Subset of RFC 1468 ISO-20220JP (ASCII and JIS X 0208-1983)
plus JIS X 0201 Katakana.
The detected network CCSID is available to the CGI program. The CCSID is stored in the CGI_ASCII_CCSID environment variable. When JCD can not detect, the default code conversion is done as configured (between NetCCSID and FsCCSID or CGIJobCCSID).
Since the code page of Stdin and QUERY_STRING are encoded according to the web client's outbound code page, we recommend using the following configuration value combinations when you use the EBCDIC_JCD or %%EBCDIC_JCD%% mode.
Startup (FsCCSID)/CGI job CCSID (CGIJobCCSID) | Startup (DefaultNetCCSID)/CGI Net CCSID (DefaultNetCCSID) | Description |
---|---|---|
5026/5035 (See note 4) | 943 Default: | MS Shift JIS |
5026/5035 (See note 4) | 942 Default | IBM PC |
5026/5035 (See note 4) | 5052/5054 Default | ISO-2022-JP |
Using CCSID 5050(eucJP) for the startup NetCCSID, is not recommended. When 5050 is specified for the startup NetCCSID, the default code conversion is done between FsCCSID and 5050. This means that if JCD cannot detect a code page, JCD returns 5050 as the default network CCSID. Most browser's use a default outbound code page of Shift JIS or ISO-2022-JP, not eucJP.
If the web client sends a charset tag, JCD gives priority to the charset tag. Stdout function is the same. If the charset/ccsid tag is specified in the Content-Type field, stdout gives priority to charset/ccsid tag. Stdout also ignores the JCD detected network CCSID.
- If startup NetCCSID is 932 or 942, detected network,
Shift JIS's CCSID is the same as startup NetCCSID. Otherwise,
Shift JIS's CCSID is 943.
Startup NetCCSID Shift JIS (JCD detected CCSID) ---------------- ------------------------------ 932 932 942 942 943 943 5052 943 5054 943 5050 943
- Netscape Navigator 3.x sends the alphanumeric characters by using JIS X 0201 Roman escape sequence (CCSID 5052) for ISO-2022-JP. Netscape Communicator 4.x sends the alphanumeric characters by using ASCII escape sequence (CCSID 5054) for ISO-2022-JP.
- JCD function has the capability to detect EUC and SBCS Katakana, but it is difficult to detect them. IBM recommends that you do not use SBCS Katakana and EUC in CGI.
- CCSID 5026 assigns lowercase alphabet characters
on a special code point. This often causes a problem with lowercase
alphabet characters. To avoid this problem, do one of the following:
- Do not use lowercase alphabet literals in CGI programs if the FsCCSID is 5026.
- Use CCSID 5035 for FsCCSID.
- Use the Charset/CCSID tag as illustrated in the following excerpt
of a CGI program:
main(){ printf("Content-Type: text/html; Charset=ISO-2022-JP\n\n"); ... }
- Do the code conversions in the CGI program. The following sample
ILE C program converts the literals into CCSID 930 (the equivalent
to CCSID 5026):
main(){ printf("Content-Type: text/html\n\n); #pragama convert(930) printf("<html>"); printf("This is katakana code page\n"); #pragama convert(0) ... }
- When the web client sends a charset tag, the network CCSID becomes the ASCII CCSID associated with Multipurpose Internet Mail Extensions (MIME) charset header. The charset tag ignores the JCD detected CCSID. When the Charset/CCSID tag is in the Content-Type header generated by the CGI program, the JCD-detected CCSID is ignored by this Charset/CCSID. Stdout will not perform a conversion if the charset is the same as the MIME's charset. Stdout will not perform a conversion if the CCSID is ASCII. Stdout will perform code conversion if the CCSID is EBCDIC. Because the environment variables and stdin are already stored in job CCSID, ensure that you are consistent between the job CCSID and the Content-Type header's CCSID.