Re: DISTINCT vs. GROUP BY

From: Dimi Paun <dimi(at)lattica(dot)com>
To: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Mihai Popa <mihai(at)lattica(dot)com>
Subject: Re: DISTINCT vs. GROUP BY
Date: 2010-02-10 01:43:47
Message-ID: 1265766227.2513.41.camel@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tue, 2010-02-09 at 17:38 -0500, Tom Lane wrote:
> The results certainly ought to be the same (although perhaps not with
> the same ordering) --- if they aren't, please provide a reproducible
> test case.

The results are the same, this is not a problem.

> As for efficiency, though, 8.1 didn't understand how to use hash
> aggregation for DISTINCT. Less-obsolete versions do know how to do
> that.

Indeed, this seem to be the issue:

tts_server_db=# EXPLAIN ANALYZE select userdata from tagrecord where clientRmaInId = 'CPC-RMA-00110' group by userdata;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=253.34..253.50 rows=16 width=15) (actual time=0.094..0.094 rows=0 loops=1)
-> Index Scan using idx_tagdata_clientrmainid on tagrecord (cost=0.00..252.85 rows=195 width=15) (actual time=0.091..0.091 rows=0 loops=1)
Index Cond: ((clientrmainid)::text = 'CPC-RMA-00110'::text)
Total runtime: 0.146 ms
(4 rows)

tts_server_db=# EXPLAIN ANALYZE select distinct userdata from tagrecord where clientRmaInId = 'CPC-RMA-00110';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=260.27..261.25 rows=16 width=15) (actual time=0.115..0.115 rows=0 loops=1)
-> Sort (cost=260.27..260.76 rows=195 width=15) (actual time=0.113..0.113 rows=0 loops=1)
Sort Key: userdata
-> Index Scan using idx_tagdata_clientrmainid on tagrecord (cost=0.00..252.85 rows=195 width=15) (actual time=0.105..0.105 rows=0 loops=1)
Index Cond: ((clientrmainid)::text = 'CPC-RMA-00110'::text)
Total runtime: 0.151 ms
(6 rows)

For now we are stuck with 8.1, so the easiest fix for us is to use GROUP BY.
Since this is fixed in later versions, I guess there's not much to see here... :)

Thanks for the quick reply!

--
Dimi Paun <dimi(at)lattica(dot)com>
Lattica, Inc.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Bryce Nesbitt 2010-02-10 04:04:18 512, 600ms query becomes 7500ms... but why? Postgres 8.3 query planner quirk?
Previous Message Tom Lane 2010-02-09 22:38:21 Re: DISTINCT vs. GROUP BY