JSON concepts

JSON (JavaScript Object Notation) is a popular format for interchanging information. It has a simple structure and is easily read by humans and machines. Due to its simplicity, it is used as an alternative to XML and does not require predetermined schema designs. While initially created for use with JavaScript, it is language independent and portable. Db2 for i conforms to the industry standard SQL support for JSON.

BSON is a standardized binary representation format for serializing JSON text. It allows for fast traversal of JSON. BSON does not provide any storage savings over JSON.

JSON objects, arrays, and scalar values

A JSON document consists of an object, which is a collection of key:value pairs. Each value may be a string, a number, the null value, an array, or another object. The key:value pairs are separated by commas. The key and value are separated by a colon. An example of a simple JSON object with two key:value pairs is: {"name":"John","age":7}

In a key:value pair, the key is always a string. For Db2, the value may be one of the following types:

  • Number – an integer, decimal, or floating point number. The decimal point is always represented by a period.

    Examples: 1, 3.14, 3E20

  • String – a sequence of characters surrounded by quotes. Any special characters are escaped.

    Examples: "John", "computer", "line 1\nline 2"

  • Null – the omission of a value, indicated by null
  • Array – square brackets surrounding an ordered list of zero or more values. The list of values may contain different types. The values in the array are separated using commas.

    Examples: [1,2,3,"yes"]

    [{"name":"Matt"},{"name":"Tim"}]

  • Object - another object

    Examples: {"isbn":"123-456-222","author": ["Jones","Smith"]}

    {"isbn":"123-456-222","author": [{"name":"Jones"},{"name":"Smith"}]}

Whitespace is allowed between pieces of JSON. However, when Db2 generates JSON, no whitespace is added.

The following escape sequences are recognized.
Escape sequence Description
\b backspace
\f form feed
\n newline
\r carriage return
\t tab
\" quotation mark
\\ backslash
\/ forward slash
\unnnn Unicode hex value

JSON path

In order to retrieve elements from a JSON object, a path expression is used. An SQL/JSON path expression begins with a JSON path mode, which is either lax or strict. This mode will be discussed below. The JSON path mode is followed by an sql-json-path-expression. This path expression begins with the context identifier, $, which is followed by accessors to navigate to the next key or array element. A key accessor begins with a period and is followed by a keyname. An array accessor begins with a left square bracket, followed by a zero-based index value, and ends with a right square bracket. The following table illustrates some examples of using the SQL/JSON path language to navigate to the various elements of a JSON object. The initial JSON is:

{"isbn":"123-456-222","author": [{"name":"Jones"},{"name":"Smith"}]} 
Path Value
$ {"isbn":"123-456-222","author": [{"name":"Jones"},{"name":"Smith"}]}
$.isbn "123-456-222"
$.author [{"name":"Jones"},{"name":"Smith")]
$.author[0] {"name":"Jones"}
$.author[1] {"name":"Smith"}
$.author[0].name "Jones"

For more information on SQL/JSON path expressions, see sql-json-path-expression.

Strict and lax modes

Two modes can be used for JSON path expressions: strict and lax.
  • In strict mode, if a structural error is encountered while using the JSON path (such as specifying a key that does not exist or referencing an object with an array accessor), an error is reported.
  • In lax mode, automatic handling of some structural problems is performed. Non-structural errors are reported in lax mode.
    • Automatic unnesting of arrays: If a key accessor in an SQL/JSON path expression encounters an array instead of an object, the array is converted to a sequence containing the array elements. The key accessor (and remaining path in the path expression) is then applied to each element in the sequence. This has the same effect as if the array accessor [*] was present before the key accessor.
    • Automatic wrapping of values into single element arrays: If an array accessor encounters a JSON value other than an array, the JSON value is converted to a single element array before the array accessor handles the value.
    • Error handling: If a structural error cannot be handled by automatic nesting or wrapping, the result of the path navigation is an empty SQL/JSON sequence.

Using strict mode can be useful to help determine why unexpected results (including no results) are returned when using a path. Since lax mode is forgiving, it can modify an incorrect path specification and return something other than what you intended. By changing to strict mode, any structural errors are diagnosed and returned to you as an error.

JSON formats

When working with formatted JSON data, there are two formats that can be used to represent the data: FORMAT JSON and FORMAT BSON. All the SQL functions and predicates provide an option for you to specify the format of incoming JSON data and the format of any generated JSON data.

If a format is not specified for an input value, the data type will be used to determine the format. If the data type is character or graphic, the JSON will be interpreted as FORMAT JSON. If the data type is a binary type, the JSON will be interpreted as FORMAT BSON.

JSON data generated using the publishing functions defaults to FORMAT JSON.

FORMAT JSON
The data is constructed as a string of characters. This data can be stored as:
  • a character string data type with any EBCDIC or Unicode CCSID except 65535
  • a Unicode graphic data type
  • a binary data type where it can be in encoded in either a UTF8 or UTF16 representation
Data formatted as JSON has strings enclosed in quotes and special characters are escaped.
FORMAT BSON
The data is encoded in the binary form of JSON. This form of data must be stored as a VARBINARY or BLOB data type. FORMAT BSON data should never be copied into a fixed length binary data type since every byte of a BSON value is important; appending additional trailing bytes, even hex zeros, changes the value and will invalidate it.1
When a decimal number is converted into FORMAT BSON, either explicitly or implicitly, it becomes a double precision floating point number. This means that its precision will be no more than 15 digits.

Processing for JSON_TABLE, JSON_QUERY, JSON_VALUE and JSON_EXISTS is performed using the FORMAT BSON representation. If the input to these functions is already FORMAT BSON, an additional conversion will be avoided. If you store a JSON document in a table and intend to use one of these functions to interpret it, it is recommended that you store it as FORMAT BSON.

When using the JSON_OBJECT, JSON_ARRAY, JSON_OBJECTAGG, and JSON_ARRAYAGG functions to build a JSON document, Db2 uses FORMAT JSON for processing. It is recommended that the result of each function that is input to another function returns FORMAT JSON (the default). If you need the final result to be FORMAT BSON, add the result data type and the FORMAT BSON clause to the outermost scalar function.

JSON_TO_BSON and BSON_TO_JSON

These two built-in scalar functions can be used to convert between JSON and BSON formatted data. If you have stored JSON data in a table and need to send it to an external site, these functions make it easy to change it to the expected format. The input argument for JSON_TO_BSON must be formatted JSON or the function will fail. Similarly, the input argument for BSON_TO_JSON must be BSON or the function will fail.

IS JSON predicate

The IS JSON predicates can be used in a query to validate JSON data. Since there is not a JSON data type, using this predicate in a trigger can ensure JSON is well-formed before inserting it in a table.

Consuming a JSON document

There are four ways of using a path to extract parts of a JSON document.
JSON construct Description
JSON_EXISTS predicate Uses the path to determine whether a value can be located in the JSON document. Since this is a predicate, the result is true, false, or unknown.
JSON_QUERY scalar function Uses the path to locate and return a JSON scalar value, a JSON array, or a JSON object from a JSON document. The data type of the result is always a string and it contains a FORMAT JSON value.
JSON_VALUE scalar function Uses the path to locate and return an SQL scalar value from a JSON document. It cannot return a JSON array or a JSON object. The data type of the result is defined as part of the function invocation.
JSON_TABLE table function Converts all or part of a JSON document into a relational table that can be queried. A distinct path is used to locate values for each result column.
JSON_TABLE is the recommended way to return multiple elements from a JSON document since the JSON document only needs to be deconstructed one time.
1 As part of an earlier JSON technology preview, JSON was generated in a non-standard BSON format that was prepended with a single x'03' byte. With the addition of the standard JSON functionality, this format will continue to be recognized but will never be generated. If you have an application that depends on this format, there is an undocumented function, SYSTOOLS.JSON2BSON, that can be used to generate it.