Re: Querying distinct values from a large table

From: "Luke Lonergan" <llonergan(at)greenplum(dot)com>
To: "Chad Wagner" <chad(dot)wagner(at)gmail(dot)com>
Cc: "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Igor Lobanov" <ilobanov(at)swsoft(dot)com>, "Richard Huxton" <dev(at)archonet(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Querying distinct values from a large table
Date: 2007-01-30 16:34:03
Message-ID: C1E4B37B.199C4%llonergan@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Chad,

On 1/30/07 7:03 AM, "Chad Wagner" <chad(dot)wagner(at)gmail(dot)com> wrote:

> On 1/30/07, Luke Lonergan <llonergan(at)greenplum(dot)com> wrote:
>> Not that it helps Igor, but we've implemented single pass sort/unique,
>> grouping and limit optimizations and it speeds things up to a single seqscan
>> over the data, from 2-5 times faster than a typical external sort.
>
> Was that integrated back into PostgreSQL, or is that part of Greenplum's
> offering?

Not yet, we will submit to PostgreSQL along with other executor node
enhancements like hybrid hash agg (fixes the memory overflow problem with
hash agg) and some other great sort work. These are all "cooked" and in the
Greenplum DBMS, and have proven themselves significant on customer workloads
with tens of terabytes already.

For now it seems that the "Group By" trick Brian suggested in this thread
combined with lots of work_mem may speed things up for this case if HashAgg
is chosen. Watch out for misestimates of stats though - hash agg may
overallocate RAM in some cases.

>> I can't think of a way that indexing would help this situation given the
>> required visibility check of each tuple.
>
> I agree, using indexes as a "skinny" table is a whole other feature that would
> be nice.

Yah - I like Hannu's ideas to make visibility less of a problem. We're
thinking about this too.

- Luke

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Gregory Stark 2007-01-30 16:38:30 Re: Querying distinct values from a large table
Previous Message Chad Wagner 2007-01-30 15:03:03 Re: Querying distinct values from a large table