Re: Group by more efficient than distinct?

From: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To: Matthew Wakeling <matthew(at)flymine(dot)org>
Cc: Pgsql performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Group by more efficient than distinct?
Date: 2008-04-22 13:04:30
Message-ID: 480DE25E.4080507@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Matthew Wakeling wrote:
> On Tue, 22 Apr 2008, Mark Mielke wrote:
>> The poster I responded to said that the memory required for a hash
>> join was relative to the number of distinct values, not the number of
>> rows. They gave an example of millions of rows, but only a few
>> distinct values. Above, you agree with me that it it would include
>> the rows (or at least references to the rows) as well. If it stores
>> rows, or references to rows, then memory *is* relative to the number
>> of rows, and millions of records would require millions of rows (or
>> row references).
>
> Yeah, I think we're talking at cross-purposes, due to hash tables
> being used in two completely different places in Postgres. Firstly,
> you have hash joins, where Postgres loads the references to the actual
> rows, and puts those in the hash table. For that situation, you want a
> small number of rows. Secondly, you have hash aggregates, where
> Postgres stores an entry for each "group" in the hash table, and does
> not store the actual rows. For that situation, you can have a
> bazillion individual rows, but only a small number of distinct groups.

That makes sense with my reality. :-)

Thanks,
mark

--
Mark Mielke <mark(at)mielke(dot)cc>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Thomas Spreng 2008-04-22 13:42:25 Re: Oddly slow queries
Previous Message Matthew Wakeling 2008-04-22 12:22:20 Re: Group by more efficient than distinct?