Quick Links

Re: Querying distinct values from a large table

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Gregory Stark <gsstark(at)mit(dot)edu>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Luke Lonergan <llonergan(at)greenplum(dot)com>, Chad Wagner <chad(dot)wagner(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Igor Lobanov <ilobanov(at)swsoft(dot)com>, Richard Huxton <dev(at)archonet(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: Querying distinct values from a large table
Date:	2007-01-31 15:15:10
Message-ID:	25904.1170256510@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Gregory Stark <gsstark(at)mit(dot)edu> writes:
> On the other there are some common situations where you could see
> atypical increases. Consider joining a bunch of small tables to
> generate a large result set. The small tables are probably all in
> memory and the result set may only have a small number of distinct
> values. If you throw out the duplicates early you save *all* the
> I/O. If you have to do a disk sort it could be many orders slower.

Right, we already have support for doing that well, in the form of
hashed aggregation. What needs to happen is to get that to work for
DISTINCT as well as GROUP BY. IIRC, DISTINCT is currently rather
thoroughly intertwined with ORDER BY, and we'd have to figure out
some way to decouple them --- without breaking DISTINCT ON, which
makes it a lot harder :-(

regards, tom lane

In response to

Re: Querying distinct values from a large table at 2007-01-31 13:24:41 from Gregory Stark

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Sidar López Cruz	2007-01-31 15:28:05	Re: Very slow queries
Previous Message	Sidar López Cruz	2007-01-31 14:58:02	Re: Very slow queries