Compressed numeric time series

If your time series data is recorded at a regular frequency and all the time series values are numeric, you can define a compressed time series to store the data efficiently.

In each element, a compressed time series stores an 11-byte timestamp for the first record and a 2-byte timestamp for each of the other records. The compression ratio of the rest of the time series data varies depending on the type of data and the compression definitions. For example, you can compress an 8-byte BIGINT value down to 1 byte, with some loss of precision.

Time series definition

You define a compressed time series by running the TSCreateIrr function with the compression parameter.

You must include a compression definition for every column in the TimeSeries subtype, except the first timestamp column. The compression definitions are associated with the columns in the same order. If you do not want to compress a particular column, include a compression definition of no compression for that column. If you specify that none of the columns are compressed, only the first timestamp column is compressed.

Besides the first timestamp column, the TimeSeries subtype columns must have only the following data types: SMALLINT, INTEGER, BIGINT, SMALLFLOAT, and FLOAT.

Note: In addition to the above data types, a compressed time series can have the following data types which can be used only with no compression, n() option. The column values will not be compressed.
  • LVARCHAR
  • VARCHAR
  • NVARCHAR
  • CHAR
  • NCHARNote:

    The CHAR and NCHAR data types can be treated in two ways: variable length stringor fixed length, space padded, string.

  • BOOLEAN
  • INT8
  • DECIMAL
  • MONEY
  • DATE
  • DATETIME
  • INTERVAL
  • BSON
  • Any fixed length UDT (with the same TS restrictions on UDTs as column in a TS subtype)
  • Any variable length UDT (with the same TS restrictions on UDTs as column in a TS subtype)

The calendar that you specify in the time series definition defines the size of the interval, however, off periods are not allowed. One record per interval is accepted and the timestamp must be on the interval boundary. For example, if the calendar has an interval of minute, a timestamp that has seconds values other than 00.00000, such as 2013-01-01 01:52:15.00000, is rejected.

Compressed records must be stored in containers. However, a compressed time series cannot be stored in rolling window containers.

Compression types

You compress data with the following types of compression algorithms:

Quantization
The quantization compression algorithm divides continuous values into discrete grids. Each grid represents a range of values. Fewer bytes are needed to represent a grid than a numeric value. The quantization algorithm can be lossy. The quantization algorithm allows NULL values.
The quantization compression algorithm is suitable when records are frequent and the values are highly variable.
You specify the upper and lower boundaries of the values of the data and the number of bytes to store for each value. The larger the difference between the upper and lower boundaries and the smaller the compressed size that you specify, the more compact and possibly lossy the data becomes.
Linear
The linear compression algorithm represents values as line segments, which are defined by two end points. If the values are within the supplied deviation, the values are not recorded. The linear compression algorithm records a value only when a new value deviates too much from the last recorded value. The linear compression algorithm does not allow NULL values.
The linear compression algorithm is suitable when values vary little.
The larger the maximum deviation that you specify, the more compact and possibly lossy the data becomes.
Use the boxcar variant if you need fast reading and writing performance.
Use the swing door variant if you need a higher compression ratio.

You can combine compression types and choose the quantization linear boxcar or quantization linear swing door compression algorithm.

You can choose not to compress a column. Columns that are not compressed allow NULL values.

Lossiness

The following equation describes margin of error that is allowed between the original and the compressed values for the different compression types:

Quantization type:

margin of error = (upper_bound - lower_bound)
                  /(2^(compress_size*8))

Linear types:

margin of error = maximum_deviation

Combination of quantization and linear types:

margin of error = (upper_bound - lower_bound)
                   /(2^(compress_size*8) + maximum_deviation)
compress_size
The size of the compressed data, in bytes.
lower_bound
The lowest acceptable value.
maximum_deviation
The absolute value of the margin of error.
upper_bound
The highest acceptable value.

For example, if the compression definition for quantization is q(1,1,100), the compression size is 1 byte, the lower boundary is 1, and the upper boundary is 100. The following equation calculates the margin of error:

(100-1)/256 = 0.387

The maximum difference between the original value and the compressed value is plus or minus 0.387.

For the linear compression type, the margin of error is equal to the maximum deviation value. For example, if the original value is 20 and the maximum deviation value is 0.1, then the compressed value is in the range 19.9 - 20.1.

If the compression definition for quantization linear boxcar is qlb(1,1,100,100000), the compression size is 1 byte, the maximum deviation is 1, the lower boundary is 100, and the upper boundary is 100000. The following equation calculates the margin of error:

(100000-100)/256= 390.235 + 1 = 391.235

The maximum difference between the original value and the compressed value is plus or minus 391.235.


Copyright© 2018 HCL Technologies Limited