Quick Links

Re: Querying distinct values from a large table

From:	"Chad Wagner" <chad(dot)wagner(at)gmail(dot)com>
To:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc:	"Igor Lobanov" <ilobanov(at)swsoft(dot)com>, "Richard Huxton" <dev(at)archonet(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: Querying distinct values from a large table
Date:	2007-01-30 14:13:27
Message-ID:	81961ff50701300613g25ea4ce6jb357c82fb1ed6733@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On 1/30/07, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>
> > explain analyze select distinct a, b from tbl
> >
> > EXPLAIN ANALYZE output is:
> >
> > Unique (cost=500327.32..525646.88 rows=1848 width=6) (actual
> > time=52719.868..56126.356 rows=5390 loops=1)
> > -> Sort (cost=500327.32..508767.17 rows=3375941 width=6) (actual
> > time=52719.865..54919.989 rows=3378864 loops=1)
> > Sort Key: a, b
> > -> Seq Scan on tbl (cost=0.00..101216.41 rows=3375941
> > width=6) (actual time=16.643..20652.610 rows=3378864 loops=1)
> > Total runtime: 57307.394 ms
>
> All your time is in the sort, not in the SeqScan.
>
> Increase your work_mem.
>

Sounds like an opportunity to implement a "Sort Unique" (sort of like a
hash, I guess), there is no need to push 3M rows through a sort algorithm to
only shave it down to 1848 unique records.

I am assuming this optimization just isn't implemented in PostgreSQL?

--
Chad
http://www.postgresqlforums.com/

In response to

Re: Querying distinct values from a large table at 2007-01-30 14:04:15 from Simon Riggs

Responses

Re: Querying distinct values from a large table at 2007-01-30 14:56:57 from Luke Lonergan

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Brian Herlihy	2007-01-30 14:38:11	Re: Querying distinct values from a large table
Previous Message	Richard Huxton	2007-01-30 14:11:48	Re: Querying distinct values from a large table