Quick Links

Re: Group by more efficient than distinct?

From:	PFC <lists(at)peufeu(dot)com>
To:	"Gregory Stark" <stark(at)enterprisedb(dot)com>, "Francisco Reyes" <lists(at)stringsutils(dot)com>
Cc:	"Pgsql performance" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Group by more efficient than distinct?
Date:	2008-04-18 10:35:04
Message-ID:	op.t9sycqo6cigqcu@apollo13.peufeu.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On Fri, 18 Apr 2008 11:36:02 +0200, Gregory Stark <stark(at)enterprisedb(dot)com>
wrote:

> "Francisco Reyes" <lists(at)stringsutils(dot)com> writes:
>
>> Is there any dissadvantage of using "group by" to obtain a unique list?
>>
>> On a small dataset the difference was about 20% percent.
>>
>> Group by
>> HashAggregate (cost=369.61..381.12 rows=1151 width=8) (actual
>> time=76.641..85.167 rows=2890 loops=1)

Basically :

- If you process up to some percentage of your RAM worth of data, hashing
is going to be a lot faster
- If the size of the hash grows larger than your RAM, hashing will fail
miserably and sorting will be much faster since PG's disksort is really
good
- GROUP BY knows this and acts accordingly
- DISTINCT doesn't know this, it only knows sorting, so it sorts
- If you need DISTINCT x ORDER BY x, sorting may be faster too (depending
on the % of distinct rows)
- If you need DISTINCT ON, well, you're stuck with the Sort
- So, for the time being, you can replace DISTINCT with GROUP BY...

In response to

Re: Group by more efficient than distinct? at 2008-04-18 09:36:02 from Gregory Stark

Responses

Re: Group by more efficient than distinct? at 2008-04-20 15:15:36 from Francisco Reyes

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Matthew	2008-04-18 11:01:11	Re: Strange behavior: pgbench and new Linux kernels
Previous Message	Gregory Stark	2008-04-18 09:36:02	Re: Group by more efficient than distinct?