From: | Brian Herlihy <btherl(at)yahoo(dot)com(dot)au> |
---|---|
To: | Postgresql Performance <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: Querying distinct values from a large table |
Date: | 2007-01-30 14:38:11 |
Message-ID: | 20070130143811.85540.qmail@web52303.mail.yahoo.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
As I understand, there's no hashing for DISTINCT, but there is for GROUP BY. GROUP BY will choose between a hash and a sort (or maybe other options?) depending on the circumstances. So you can write
SELECT a, b FROM tbl GROUP BY a,b
and the sort/unique part of the query may run faster.
Brian
----- Original Message ----
From: Chad Wagner <chad(dot)wagner(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Igor Lobanov <ilobanov(at)swsoft(dot)com>; Richard Huxton <dev(at)archonet(dot)com>; pgsql-performance(at)postgresql(dot)org
Sent: Tuesday, 30 January, 2007 10:13:27 PM
Subject: Re: [PERFORM] Querying distinct values from a large table
On 1/30/07, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> explain analyze select distinct a, b from tbl
>
> EXPLAIN ANALYZE output is:
>
> Unique (cost=500327.32..525646.88 rows=1848 width=6) (actual
> time=52719.868..56126.356 rows=5390 loops=1)
> -> Sort (cost=500327.32..508767.17 rows=3375941 width=6) (actual
> time=52719.865..54919.989 rows=3378864 loops=1)
> Sort Key: a, b
> -> Seq Scan on tbl (cost=0.00..101216.41
rows=3375941
> width=6) (actual time=16.643..20652.610 rows=3378864 loops=1)
> Total runtime: 57307.394 ms
All your time is in the sort, not in the SeqScan.
Increase your work_mem.
Sounds like an opportunity to implement a "Sort Unique" (sort of like a hash, I guess), there is no need to push 3M rows through a sort algorithm to only shave it down to 1848 unique records.
I am assuming this optimization just isn't implemented in PostgreSQL?
From | Date | Subject | |
---|---|---|---|
Next Message | Dave Dutcher | 2007-01-30 14:54:33 | Re: Bad Row Count Estimate on View with 8.2 |
Previous Message | Chad Wagner | 2007-01-30 14:13:27 | Re: Querying distinct values from a large table |