Quick Links

Re: Querying distinct values from a large table

From:	"Luke Lonergan" <llonergan(at)greenplum(dot)com>
To:	"Chad Wagner" <chad(dot)wagner(at)gmail(dot)com>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc:	"Igor Lobanov" <ilobanov(at)swsoft(dot)com>, "Richard Huxton" <dev(at)archonet(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: Querying distinct values from a large table
Date:	2007-01-30 14:56:57
Message-ID:	C1E49CB9.199A4%llonergan@greenplum.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Chad,

On 1/30/07 6:13 AM, "Chad Wagner" <chad(dot)wagner(at)gmail(dot)com> wrote:

> Sounds like an opportunity to implement a "Sort Unique" (sort of like a hash,
> I guess), there is no need to push 3M rows through a sort algorithm to only
> shave it down to 1848 unique records.
>
> I am assuming this optimization just isn't implemented in PostgreSQL?

Not that it helps Igor, but we've implemented single pass sort/unique,
grouping and limit optimizations and it speeds things up to a single seqscan
over the data, from 2-5 times faster than a typical external sort.

I can't think of a way that indexing would help this situation given the
required visibility check of each tuple.

- Luke

In response to

Re: Querying distinct values from a large table at 2007-01-30 14:13:27 from Chad Wagner

Responses

Re: Querying distinct values from a large table at 2007-01-30 15:03:03 from Chad Wagner
Re: Querying distinct values from a large table at 2007-01-30 16:38:30 from Gregory Stark

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Chad Wagner	2007-01-30 15:03:03	Re: Querying distinct values from a large table
Previous Message	Dave Dutcher	2007-01-30 14:54:33	Re: Bad Row Count Estimate on View with 8.2