Distinct optimization

Distinct is used to compare a value with another value.

There are two methods to write a query that returns distinct values in SQL. One method uses the DISTINCT keyword:

SELECT DISTINCT COL1, COL2  	
	FROM TABLE1

The second method uses GROUP BY:

SELECT COL1, COL2  	
	FROM TABLE1  	
	GROUP BY COL1, COL2

All queries that contain a DISTINCT, and are run using SQE, rewritten into queries using GROUP BY. This rewrite enables queries using DISTINCT to take advantage of the many grouping techniques available to the optimizer.

Distinct to Grouping implementation

The following example query has a DISTINCT:

SELECT DISTINCT COL1, COL2  	
	FROM T1  	
	WHERE COL2 > 5 AND COL3 = 2

The optimizer rewrites it into this query:

SELECT COL1, COL2  	
	FROM T1  	
	WHERE COL2 > 5 AND COL3 = 2  	
	GROUP BY COL1, COL2

Distinct removal

A query containing a DISTINCT over whole-file aggregation (no grouping or selection) allows the DISTINCT to be removed. For example, look at this query with DISTINCT:

SELECT DISTINCT COUNT(C1), SUM(C1) 	
	FROM TABLE1

The optimizer rewrites this query as the following:

SELECT COUNT(C1), SUM(C1) 	
	FROM TABLE1

If the DISTINCT and the GROUP BY fields are identical, the DISTINCT can be removed. If the DISTINCT fields are a subset of the GROUP BY fields (and there are no aggregates), the DISTINCTs can be removed.