BUG #5231: SELECT DISTINCT poorly implemented vs SELECT ... GROUP BY

From: "Thomas Hamilton" <thomashamilton76(at)yahoo(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #5231: SELECT DISTINCT poorly implemented vs SELECT ... GROUP BY
Date: 2009-12-03 15:56:05
Message-ID: 200912031556.nB3Fu5sv015354@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


The following bug has been logged online:

Bug reference: 5231
Logged by: Thomas Hamilton
Email address: thomashamilton76(at)yahoo(dot)com
PostgreSQL version: 8.3.8
Operating system: Ubuntu 4.2.4
Description: SELECT DISTINCT poorly implemented vs SELECT ... GROUP
BY
Details:

SELECT DISTINCT does a Sort followed by Unique.

SELECT ... GROUP BY, which is logically equivalent, performs a
HashAggregate.

When run against a large dataset with a small number of distinct results
HashAggregate is an order of magnitude more efficient!

Since the spec does not require DISTINCT to return sorted results, I don't
believe Sort ... Unique will ever be more efficient than HashAggregate.

Therefore, in order to maximize performance, DISTINCT should always be
implemented as HashAggregate.

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Joshua Tolley 2009-12-03 16:24:52 Re: BUG #5231: SELECT DISTINCT poorly implemented vs SELECT ... GROUP BY
Previous Message Tom Lane 2009-12-03 15:21:05 Re: Assertion failure with a subtransaction and cursor