AggregateBy function

The AggregateBy function aggregates the values in a time series using a new time interval that you specify by providing a calendar.

This function can be used to convert a time series with a small interval to a time series with a larger interval: for instance, to produce a weekly time series from a daily time series.

If you supply the optional start and end DATETIME parameters, just that part of the time series is aggregated to the new time interval.

Syntax

AggregateBy(agg_express   lvarchar, 
           cal_name       lvarchar, 
           ts            TimeSeries
           flags         integer default 0
           start datetime year to fraction(5) default NULL,
           end datetime year to fraction(5) default NULL
)
returns TimeSeries;

agg_express: A comma-separated list of these SQL aggregate operators: MIN, MAX, MEDIAN, SUM, AVG, FIRST, LAST, or Nth. The MIN, MAX, MEDIAN, SUM, and AVG expressions can operate only on numeric columns.
cal_name: The name of a calendar that defines the aggregation period.
ts: The time series to be aggregated.
flags (optional): Determines how data points in off periods of calendars are handled during aggregation. See The flags argument values.
start (optional): The date and time at which to start aggregation.
end (optional): The date and time at which to end aggregation.

Description

The AggregateBy function converts the input time series to a regular time series with a calendar given by the cal_name argument.

The agg_express expressions operate on a column of the input time series, which is specified by one of the following column identifiers:

$colname

The colname is the name of the column to aggregate in the TimeSeries data type. For example, if the column name is high, the column identifier is $high.

$colnumber

The colnumber is the position of the column to aggregate in the TimeSeries data type. For example if the column number is 1, the column identifier is $1.

$bson_field_name

The bson_field_name is the name of a field in at least one BSON document in the BSON column in the TimeSeries data type. For example, if the field name is v1, the column identifier is $v1. If the BSON field name is the same as another column in the TimeSeries data type, you must qualify the field name in one of the following ways:

$colname.bson_field_name
For example, if the BSON column name is b_data and the field name is v1, the column identifier is $b_data.v1.
$colnumber.bson_field_name
For example, if the BSON column number is 1 and the field name is v1, the column identifier is $1.v1.

You must cast the results of the AggregateBy function on a BSON field to a TimeSeries data type that has the appropriate type of columns for the result of the expression.

The Nth expression returns the value of a column for the specified aggregation period, using the following syntax:

Nth($col, n)

$col: The column identifier.
n: A positive or negative number that indicates the position of the TimeSeries row within the aggregation period. Positive values of n begin at the first row in the aggregation period; therefore, Nth($col, 1) is equivalent to FIRST($col). Negative values of n begin with the last row in the aggregation period; therefore, Nth($col, -1) is equivalent to LAST($col).
If an aggregation period does not have a value for the nth row, then the Nth function returns a null value for that period. The Nth function is more efficient for positive values of the n argument than for negative values.

An aggregation time period is denoted by the start date and time of the period.

The origin of the aggregated output time series is the first period on or before the origin of the input time series. Each output period is the aggregation of all input periods from the start of the output period up to, but not including, the start of the next output period.

For instance, suppose you want to aggregate a daily time series that starts on Tuesday, Jan. 4, 2011, to a weekly time series. The input calendar, named “days,” starts at 12:00 a.m., and the output calendar, named “weeks,” starts at 12:00 a.m., on Monday.

The first output time is 00:00 Jan. 3, 2011; it is the aggregation of all input values from the input origin, Jan. 4, 2011, to 23:59:59.99999 Jan. 9, 2011. The second output time is 00:00 Jan. 10, 2011; it is the aggregation of all input values from 00:00 Jan 10, 2011 to 23:59:59.99999 Jan. 16, 2011.

Typically, the AggregateBy function is used to aggregate from a fine-grained regular time series to a coarser-grained one. However, the following scenarios are also supported:

Converting from a regular time series to a time series with a calendar of the same granularity. In this case, AggregateBy shifts the times back to accommodate differences in the calendar start times: for example, 00:00 from 8:00. Elements can be removed or null elements added to accommodate differences in the on/off pattern.
Converting from a regular time series to one with a calendar of finer granularity. In this case, AggregateBy replicates values.
The input time series is irregular. Because the granularity of an irregular time series does not depend on the granularity of the calendar, this case is treated like aggregation from a fine-grained time series to a coarser-grained one. This type of aggregation always produces a regular time series.

The flags argument values

The flags argument determines how data points in the off periods of calendars are handled during aggregation and how hidden elements are managed. It can have the following values.

0: (Default) Data in off periods is aggregated with the next output period.
1: Data in off periods is aggregated with the previous output period.
2: Indicates that the scan runs with the TS_SCAN_HIDDEN flag set (hidden elements are returned).
4: Indicates that the scan runs with the TS_SCAN_SKIP_HIDDEN flag set (hidden elements are not returned).

For example, consider an input time series that has a daily calendar with no off days: it has data from weekdays and weekends. If you aggregate this data by a business-day calendar (5 days on, 2 days off, starting on a Monday), a flags argument of 0 causes weekend data to be aggregated with the next Monday's data, and a flags argument of 1 causes weekend data to be aggregated with the previous Friday's data.

For another example, consider a quarterly calendar that is defined as:

'startdate(2010-1-1 00:00:00.00000), pattstart(2010-1-1 00:00:00.00000),
pattern({1 on, 2 off}, month'

If you aggregate this calendar with either a flags argument of 0 or no flags argument, all input points up to, but not including, 2010-2-1 00:00:00.00000 are aggregated into the first output element. All points from 2010-2-1 00:00:00.00000 up to, but not including, 2010-5-1 00:00:00.00000 are aggregated into the second output element, and so on.

If the flags argument is 1, all input points up to but not including 2010- 4-1 00:00:00.00000 are aggregated into the first output element. All points from 2010-4-1 00:00:00.00000 up to, but not including, 2010-7-1 00:00:00.00000 are aggregated into the second output element, and so on. The AggregateBy clause might look like this:

AggregateBy('max($high)', 'quarterlycal', ts, 1);

Returns

The aggregated time series, which is always regular, if you are aggregating to a new time interval. The resulting time series has a time stamp column plus one column for each expression in the list.

Examples: Stock data

The following query aggregates the daily_stocks time series to a weekly time series:

insert into daily_stocks( stock_id, stock_name, stock_data)
   select stock_id, stock_name,
   AggregateBy( 'max($high), min($low),last($final),sum($vol)',
   'weekcal', stock_data)::TimeSeries(stock_bar)
   from daily_stocks;

The following query clause selects the second price from each week:

AggregateBy( 'Nth($price, 2)', 'weekly', ts)

This query clause selects the second to the last price from each week:

AggregateBy( 'Nth($price, -2)', 'weekly', ts)

Examples: BSON data

This example is based on the following row type and time series definition. The TimeSeries row type contains an INTEGER column that is named v1 and the BSON column contains a field that is also named v1.

CREATE ROW TYPE rb(timestamp datetime year to fraction(5), data bson, v1 int); 

INSERT INTO tj VALUES(1,'origin(2011-01-01 00:00:00.00000), calendar(ts_15min), 
container(kontainer),threshold(0), regular,[({"v1":99},20)]');

The following statement creates a TimeSeries data type to hold the results of the aggregation on the BSON field in an INTEGER column:

CREATE ROW TYPE outrow(timestamp datetime year to fraction(5), x int);

If a column and a BSON field have the same name, the column takes precedence. The following statement returns the maximum value from the v1 INTEGER column:

SELECT AggregateBy('max($v1)','ts_1year',tsdata,0
           "2011-01-01 00:00:00.00000"::datetime year to fraction(5),
           "2012-01-01 00:00:00.00000"::datetime year to fraction(5)) 
FROM tj;

The following two equivalent statements return the maximum value from the v1 field in the data BSON column, which is column 1 in the TimeSeries row type:

SELECT AggregateBy('max($data.v1)','ts_1year',tsdata,0
           "2011-01-01 00:00:00.00000"::datetime year to fraction(5),
           "2012-01-01 00:00:00.00000"::datetime year to fraction(5))
           ::timeseries(outrow)
FROM tj; 

SELECT AggregateBy('max($1.v1)','ts_1year',tsdata,0
           "2011-01-01 00:00:00.00000"::datetime year to fraction(5),
           "2012-01-01 00:00:00.00000"::datetime year to fraction(5))
           ::timeseries(outrow)
FROM tj;

The aggregated time series that is returned has the TimeSeries data type outrow. If you do not cast the result to a row type that has appropriate columns for the results, the statement fails.