Syntax
(1)
>>-------SELECT--SelectClause--FromClause--+-------------+-----><
'-WhereClause-'
WHERE
.-<<---- ,------ <<--------.
V |
|--SelectClause = --+---Expression--+----------+-+----+--------|
| +-AS--Path-+ |
| '-INSERT---' |
+-ITEM--Expression----------------+
| (2) |
'-+-COUNT-----+--(--Expression--)-'
+-MAX-------+
+-MIN-------+
'-SUM-------'
|--FromClause = --FROM----------------------------------------->
.-<<---- ,------ <<-----------------------.
V |
>----FieldReference--+---------------------+-+------------------|
'-AS--CorrelationName-'
|--WhereClause = --WHERE--Expression---------------------------|
Notes:
- You no longer require the enclosing parentheses in SELECT
expressions. This does not prevent you using parentheses but, if they
are present, they are merely normal, expression-scoping, parentheses.
- For the COUNT parameter only, you can specify the value of
the following Expression as a single star (*).
Usage
The
SELECT function is the usual and most efficient way of transforming
messages. You can use SELECT to:
- Comprehensively reformat messages
- Access database tables
- Make an output array that is a subset of an input array
- Make an output array that contains only the values of an input
array
- Count the number of entries in an array
- Select the minimum or maximum value from a number of entries in
an array
- Sum the values in an array
Introduction to SELECT
The
SELECT function considers a message tree (or sub-tree) to consist
of a number of rows and columns, rather like a database table. A
FieldReference in
a FROM clause identifies a field in a message tree. The identified
field is regarded in the following ways:
- The identified field is regarded as a row in a table.
- The field's siblings are regarded as other rows of the same
table.
- The field's children are regarded as the table's columns.
Note: The FieldReference in a FROM clause
can also be a table reference that refers directly to a real database
table.
The return value of the SELECT function is typically
another message tree that contains rows whose structure and content
is determined by the SelectClause. The number of
rows in the result is the sum of all the rows pointed to by all the
field references and table references in the FROM clause, filtered
by the WHERE clause; only those fields for which the WHERE clause
evaluates to TRUE are included.
The return value of the SELECT
function can also be scalar (see ITEM selections).
You
can specify the
SelectClause in several ways; see:
If
you have created a message flow that contains one of the following
nodes, and the ESQL that is associated with this node includes a SELECT
function and a database reference, you must specify a value for the
Data source property of the relevant
node:
Simple selections
To
understand the SELECT function in more detail, first consider the
following simple case:
- The SelectClause consists of a number of expressions,
each with an AS Path clause.
- The FROM clause contains a single FieldReference and
an AS CorrelationName clause.
The SELECT function creates a local, reference, correlation variable,
whose name is given by the AS
CorrelationName clause,
and then steps, in turn, through each row of the list of rows derived
from the FROM clause. For each row:
- The correlation variable is set to point to the current row.
- The WHERE clause (if present) is evaluated. If it evaluates to
FALSE or unknown (null), nothing is added to the result tree and processing
proceeds to the next row of the input. Otherwise processing proceeds
to the next step.
- A new member is added to the result list.
- The SELECT clause expressions are evaluated and assigned to fields
named as dictated by the AS Path clause. These
fields are child fields of the new member of the result list.
Typically, both the SelectClause and
the WHERE clause expressions use the correlation variable to access
column values (that is, fields in the input message tree) and thus
to build a new message tree containing data from the input message.
The correlation variable is referred to by the name specified in the
AS CorrelationName clause or, if an AS clause is
not specified, by the final name in the FROM FieldReference (that
is, the name after the last dot).
Note that:
- Despite the analogy with a table, you are not restricted to accessing
or creating messages with a flat, table-like, structure; you can access
and build trees with arbitrarily deep folder structures.
- You are not restricted to a column being a single value; a column
can be a repeating list value or a structure.
These concepts are best understood by reference to the examples.
If
the field reference is a TableReference, the operation
is very similar. In this case, the input is a real database table
and is thus restricted to the flat structures supported by databases.
The result tree is still not so restricted, however.
If the
FROM clause contains more than one field reference, the rightmost
reference steps through each of its rows for each row in the next-to-rightmost
reference, and so on. The total number of rows in the result is thus
the product of the number of rows in each table. Such selects are
known as joins and commonly use a WHERE clause that excludes
most of these rows from the result. Joins are commonly used to add
database data to messages.
The AS
Path clause
is optional. If it is unspecified, the broker generates a default
name according to the following rules:
- If the SelectClause expression is a reference
to a field or a cast of a reference to a field, the name of the field
is used.
- Otherwise the broker uses the default names Column1, Column2,
and so on.
Examples
The following
example performs a SELECT on the table Parts in the schema Shop in
the database DSN1. Because no WHERE clause exists, all rows are selected.
Because the select clause expressions (for example, P.PartNumber)
contain no AS clauses, the fields in the result adopt the same names:
SET PartsTable.Part[] = SELECT
P.PartNumber,
P.Description,
P.Price
FROM Database.DSN1.Shop.Parts AS P;
If the target of
the SET statement (PartsTable) is a variable of type ROW, after the
statement is executed PartsTable will have, as children of its root
element, a field called Part for each row in the table. Each of the
Part fields will have child fields called PartNumber, Description,
and Price. The child fields will have values dictated by the contents
of the table. (PartsTable could also be a reference into a message
tree).
The next example performs a similar SELECT. This case
differs from the last in that the SELECT is performed on the message
tree produced by the first example (rather than on a real database
table). The result is assigned into a subfolder of OutputRoot:
SET OutputRoot.XMLNS.Data.TableData.Part[] = SELECT
P.PartNumber,
P.Description,
P.Price
FROM PartsTable.Part[] AS P;
INSERT selections
The
INSERT clause is an alternative to the AS clause. It assigns the result
of the SelectClause expression (which must be a
row) to the current new row itself, rather than to a child of it.
The effect of this is to merge the row result of the expression into
the row being generated by the SELECT. This differs from the AS clause,
in that the AS clause always generates at least one child element
before adding a result, whereas INSERT generates none. INSERT is useful
when inserting data from other SELECT operations, because it allows
the data to be merged without extra folders.
ITEM selections
The
SelectClause can
consist of the keyword ITEM and a single expression. The effect of
this is to make the results nameless. That is, the result is a list
of values of the type returned by the expression, rather than a row.
This option has several uses:
- In conjunction with a scalar expression and the THE function,
it can be used to create a SELECT query that returns a single scalar
value (for example, the price of a particular item from a table).
- In conjunction with a CASE expression and ROW constructors, it
can be used to create a SELECT query that creates or handles messages
in which the structure of some rows (that is, repeats in the message)
is different from others. This is useful for handling messages that
have a repeating structure but in which the repeats do not all have
the same structure.
- In conjunction with a ROW constructor, it can be used to create
a SELECT query that collapses levels of repetition in the input message.
Column function selections
The SelectClause can
consist of one of the functions COUNT, MAX, MIN, and SUM operating
on an expression. These functions are known as column functions. They
return a single scalar value (not a list) giving the count, maximum,
minimum, or sum of the values that Expression evaluated
to in stepping through the rows of the FROM clause. If Expression evaluates
to NULL for a particular row, the value is ignored, so that the function
returns the count, maximum, minimum, or sum of the remaining rows.
For
the COUNT function only, Expression can consist
of a single star (*). This form counts the rows regardless of null
values.
To make the result a useful reflection of the input
message, Expression typically includes the correlation
variable.
Typically, Expression evaluates
to the same data type for each row. In these cases, the result of
the MAX, MIN, and SUM functions are of the same data type as the operands.
The returned values are not required to be all of the same type however,
and if they are not, the normal rules of arithmetic apply. For example,
if a field in a repeated message structure contains integer values
for some rows and float values for others, the sum follows the normal
rules for addition. The sum is of type float because the operation
is equivalent to adding a number of integer and float values.
The
result of the COUNT function is always an integer.
Differences between message and
database selections
FROM expressions in which a correlation
variable represents a row in a message behave slightly differently
from those in which the correlation variable represents a row in a
real database table.
In the message case, a path involving a
star (*) has the normal meaning; it ignores the field's name and finds
the first field that matches the other criteria (if any).
In
the database case a star (*) has, for historical reasons, the special
meaning of all fields. This special meaning requires advance knowledge
of the definition of the database table and is supported only when
querying the default database (that is, the database pointed to by
the node's
data source attribute). For example, the
following queries return column name and value pairs only when querying
the default database:
SELECT * FROM Database.Datasource.SchemaName.Table As A
SELECT A.* FROM Database.Datasource.SchemaName.Table As A
SELECT A FROM Database.Datasource.SchemaName.Table AS A
Specifying the SELECT expressions
- SelectClause
- SelectClause expressions can use any of the
broker's operators and functions in any combination. They can
refer to the tables' columns, message fields, correlation names
declared by containing SELECTs, and to any other declared variables
or constants that are in scope.
- AS Path
- An AS Path expression is a relative path (that
is, there is no correlation name) but is otherwise unrestricted in
any way. For example, it can contain:
- Indexes (for example, A.B.C[i])
- Field-type specifiers (for example, A.B.(XML.Attribute)C )
- Multipart paths (for example, A.B.C )
- Name expressions (for example, A.B.{var})
Any expressions in these paths can also use any of the broker's
operators and functions in any combination. The expressions can refer
to the tables' columns, message fields, correlation names declared
by containing SELECTs, and any declared variables or constants.
- FROM clause
- FROM clause expressions can contain multiple database references,
multiple message references, or a mixture of the two. You can join
tables with tables, messages with messages, or tables with messages.
FROM
clause FieldReferences can contain expressions
of any kind (for example, Database.{DataSource}.{Schema}.Table1).
You
can calculate a field, data source, schema, or table name at run time.
- WHERE clause
The
WHERE clause expression can use any of the broker's operators
and functions in any combination. It can refer to table columns, message
fields, and any declared variables or constants.
However,
be aware that the broker treats the WHERE clause expression by examining
the expression and deciding whether the whole expression can be evaluated
by the database. If it can, it is given to the database. In order
to be evaluated by the database, it must use only those functions
and operators supported by the database.
The WHERE
clause can, however, refer to message fields, correlation names declared
by containing SELECT functions, and to any other declared variables
or constants within scope.
If the whole expression
cannot be evaluated by the database, the broker looks for top-level
AND operators and examines each sub-expression separately. It then
attempts to give the database those sub-expressions that it can evaluate,
leaving the broker to evaluate the rest. You need to be aware of this
situation for two reasons:
- Apparently trivial changes to WHERE clause expressions can have
large effects on performance. You can determine how much of the expression
was given to the database by examining a user trace.
- Some databases' functions exhibit subtle differences of behavior
from those of the broker.
Relation to the THE function
You
can use the function THE (which returns the first element of a list)
in conjunction with SELECT to produce a non-list result. This is useful,
for example, when a SELECT query is required to return no more than
one item. It is particularly useful in conjunction with ITEM (see ITEM selections).
Differences from the SQL standard
ESQL
SELECT differs from database SQL SELECT in the following ways:
- ESQL can produce tree-structured result data
- ESQL can accept arrays in SELECT clauses
- ESQL has the THE function and the ITEM and INSERT parameters
- ESQL has no SELECT ALL function in this release
- ESQL has no ORDER BY function in this release
- ESQL has no SELECT DISTINCT function in this release
- ESQL has no GROUP BY or HAVING parameters in this release
- ESQL has no AVG column function in this release
Restrictions
The following restrictions
apply to the current release:
- When a SELECT command operates on more than one database table,
all the tables must be in the same database instance. (That is, the TableReferences
must not specify different data source names.)
- If the FROM clause refers to both messages and tables, the tables
must precede the messages in the list.
- Using dynamic DSN, SCHEMA and TABLE names with 'SELECT
*' statements is not supported. If you use a schema, table or datasource
name as a variable (dynamic variables) in 'SELECT *' queries, the
variables are not resolved to the correct set of schema or table names.
- The WHERE clause of a SELECT statement cannot itself
contain a SELECT statement that relies on results returned from the
original SELECT if either SELECT statement is from database tables.