Re: sequential scan on select distinct

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Pierre-Frédéric Caillaud <lists(at)boutiquenumerique(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Greg Stark" <gsstark(at)mit(dot)edu>, pgsql-performance(at)postgresql(dot)org
Subject: Re: sequential scan on select distinct
Date: 2004-10-07 17:08:11
Message-ID: 87pt3u78xw.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


Pierre-Frédéric Caillaud <lists(at)boutiquenumerique(dot)com> writes:

> I see this as a minor annoyance only because I can write GROUP BY
> instead of DISTINCT and get the speed boost. It probably annoys people
> trying to port applications to postgres though, forcing them to rewrite
> their queries.

Yeah, really DISTINCT and DISTINCT ON are just special cases of GROUP BY. It
seems it makes more sense to put the effort into GROUP BY and just have
DISTINCT and DISTINCT ON go through the same code path. Effectively rewriting
it internally as a GROUP BY.

The really tricky part is that a DISTINCT ON needs to know about a first()
aggregate. And to make optimal use of indexes, a last() aggregate as well. And
ideally the planner/executor needs to know something is magic about
first()/last() (and potentially min()/max() at some point) and that they don't
need the complete set of tuples to calculate their results.

--
greg

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Ole Langbehn 2004-10-07 17:21:08 Re: sequential scan on select distinct
Previous Message Gabriele Bartolini 2004-10-07 17:07:04 Re: Data warehousing requirements