Quick Links

Re: sequential scan on select distinct

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Pierre-Frédéric Caillaud <lists(at)boutiquenumerique(dot)com>
Cc:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Greg Stark" <gsstark(at)mit(dot)edu>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: sequential scan on select distinct
Date:	2004-10-07 17:08:11
Message-ID:	87pt3u78xw.fsf@stark.xeocode.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Pierre-Frédéric Caillaud <lists(at)boutiquenumerique(dot)com> writes:

> I see this as a minor annoyance only because I can write GROUP BY
> instead of DISTINCT and get the speed boost. It probably annoys people
> trying to port applications to postgres though, forcing them to rewrite
> their queries.

Yeah, really DISTINCT and DISTINCT ON are just special cases of GROUP BY. It
seems it makes more sense to put the effort into GROUP BY and just have
DISTINCT and DISTINCT ON go through the same code path. Effectively rewriting
it internally as a GROUP BY.

The really tricky part is that a DISTINCT ON needs to know about a first()
aggregate. And to make optimal use of indexes, a last() aggregate as well. And
ideally the planner/executor needs to know something is magic about
first()/last() (and potentially min()/max() at some point) and that they don't
need the complete set of tuples to calculate their results.

--
greg

In response to

Re: sequential scan on select distinct at 2004-10-06 19:38:26 from Tom Lane

Responses

Re: sequential scan on select distinct at 2004-10-08 08:54:59 from Pierre-Frédéric Caillaud

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Ole Langbehn	2004-10-07 17:21:08	Re: sequential scan on select distinct
Previous Message	Gabriele Bartolini	2004-10-07 17:07:04	Re: Data warehousing requirements