DISTINCT vs. GROUP BY

From: Dimi Paun <dimi(at)lattica(dot)com>
To: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: DISTINCT vs. GROUP BY
Date: 2010-02-09 21:46:16
Message-ID: 1265751976.2513.34.camel@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

>From what I've read on the net, these should be very similar,
and should generate equivalent plans, in such cases:

SELECT DISTINCT x FROM mytable
SELECT x FROM mytable GROUP BY x

However, in my case (postgresql-server-8.1.18-2.el5_4.1),
they generated different results with quite different
execution times (73ms vs 40ms for DISTINCT and GROUP BY
respectively):

tts_server_db=# EXPLAIN ANALYZE select userdata from tagrecord where clientRmaInId = 'CPC-RMA-00110' group by userdata;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=775.68..775.69 rows=1 width=146) (actual time=40.058..40.058 rows=0 loops=1)
-> Bitmap Heap Scan on tagrecord (cost=4.00..774.96 rows=286 width=146) (actual time=40.055..40.055 rows=0 loops=1)
Recheck Cond: ((clientrmainid)::text = 'CPC-RMA-00110'::text)
-> Bitmap Index Scan on idx_tagdata_clientrmainid (cost=0.00..4.00 rows=286 width=0) (actual time=40.050..40.050 rows=0 loops=1)
Index Cond: ((clientrmainid)::text = 'CPC-RMA-00110'::text)
Total runtime: 40.121 ms

tts_server_db=# EXPLAIN ANALYZE select distinct userdata from tagrecord where clientRmaInId = 'CPC-RMA-00109';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=786.63..788.06 rows=1 width=146) (actual time=73.018..73.018 rows=0 loops=1)
-> Sort (cost=786.63..787.34 rows=286 width=146) (actual time=73.016..73.016 rows=0 loops=1)
Sort Key: userdata
-> Bitmap Heap Scan on tagrecord (cost=4.00..774.96 rows=286 width=146) (actual time=72.940..72.940 rows=0 loops=1)
Recheck Cond: ((clientrmainid)::text = 'CPC-RMA-00109'::text)
-> Bitmap Index Scan on idx_tagdata_clientrmainid (cost=0.00..4.00 rows=286 width=0) (actual time=72.936..72.936 rows=0 loops=1)
Index Cond: ((clientrmainid)::text = 'CPC-RMA-00109'::text)
Total runtime: 73.144 ms

What gives?

--
Dimi Paun <dimi(at)lattica(dot)com>
Lattica, Inc.

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Thom Brown 2010-02-09 22:22:17 Re: DISTINCT vs. GROUP BY
Previous Message Jeff 2010-02-09 19:14:11 Re: Linux I/O tuning: CFQ vs. deadline