Quick Links

Re: Group by more efficient than distinct?

From:	Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To:	Matthew Wakeling <matthew(at)flymine(dot)org>
Cc:	Pgsql performance <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Group by more efficient than distinct?
Date:	2008-04-22 13:04:30
Message-ID:	480DE25E.4080507@mark.mielke.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Matthew Wakeling wrote:
> On Tue, 22 Apr 2008, Mark Mielke wrote:
>> The poster I responded to said that the memory required for a hash
>> join was relative to the number of distinct values, not the number of
>> rows. They gave an example of millions of rows, but only a few
>> distinct values. Above, you agree with me that it it would include
>> the rows (or at least references to the rows) as well. If it stores
>> rows, or references to rows, then memory *is* relative to the number
>> of rows, and millions of records would require millions of rows (or
>> row references).
>
> Yeah, I think we're talking at cross-purposes, due to hash tables
> being used in two completely different places in Postgres. Firstly,
> you have hash joins, where Postgres loads the references to the actual
> rows, and puts those in the hash table. For that situation, you want a
> small number of rows. Secondly, you have hash aggregates, where
> Postgres stores an entry for each "group" in the hash table, and does
> not store the actual rows. For that situation, you can have a
> bazillion individual rows, but only a small number of distinct groups.

That makes sense with my reality. :-)

Thanks,
mark

--
Mark Mielke <mark(at)mielke(dot)cc>

In response to

Re: Group by more efficient than distinct? at 2008-04-22 12:22:20 from Matthew Wakeling

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Thomas Spreng	2008-04-22 13:42:25	Re: Oddly slow queries
Previous Message	Matthew Wakeling	2008-04-22 12:22:20	Re: Group by more efficient than distinct?