Sharded queries

You can query a sharded table as if it is a single table on one database server. However, restrictions for distributed queries between database servers and restrictions specific to sharded queries apply.

When you run a sharded query, do not include server name qualifications for remote servers.

If the SHARD_ID configuration parameter is set to unique values on each shard server in the shard cluster, sharded queries are run in parallel on each shard server.

If you set the USE_SHARDING session environment option, insert, update, and delete operations on shard tables use the two-phase commit protocol. Otherwise, sharded insert, update, and delete operations follow the eventually consistency model where data is moved to the appropriate shard server after the transaction is committed.

If your shard servers have high-availability secondary servers, you can run sharded queries from the secondary servers.

Data types

A sharded query can return the following data types: non-opaque atomic built-in data types, LVARCHAR, Boolean, BSON, and JSON. Sharded queries cannot return distinct data types.

To run sharded queries on time series data in a TimeSeries data type, shard a virtual table that is based on the time series table.

Restrictions

You can include only one sharded table in a query block.

You cannot include the following SQL syntax elements in a query that includes a sharded table:

  • DataBlade API routines
  • Java user-defined routines
  • Triggers
  • A FOR UPDATE clause in a SELECT statement

You cannot run an EXECUTE FUNCTION or EXECUTE PROCEDURE statement for a routine to operate on a sharded table.

You cannot run a statement that contains an update to a shard key that requires the row to move to another shard server. To update the shard key of a row, delete the row and then insert it with the new values.

You cannot shard data in an XA environment.

Performance tips

You can improve the speed of sharded queries by customizing how shared memory for sharded queries is allocated. You can control shared memory allocation by setting the SHARD_MEM configuration parameter on each shard server.

If your sharded queries frequently include joins to another table, replicate that table to all the shard servers to improve query performance.

If your sharded queries included stored routines as a filter, define the routines on all the shard servers. Queries run faster when the data is filtered on each shard server before being returned.

If the SHARD_ID configuration parameter is set on all shard servers, the shard servers use server multiplexer group (SMX) connections. You can reduce latency between shard servers by increasing the number of pipes that are used for the SMX connections. Set the SMX_NUMPIPES configuration parameter to the number of pipes.