Quick Links

Re: Group by more efficient than distinct?

From:	Matthew Wakeling <matthew(at)flymine(dot)org>
To:	Pgsql performance <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Group by more efficient than distinct?
Date:	2008-04-22 10:34:23
Message-ID:	Pine.LNX.4.64.0804221130190.12158@aragorn.flymine.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On Mon, 21 Apr 2008, Mark Mielke wrote:
> This surprises me - hash values are lossy, so it must still need to confirm
> against the real list of values, which at a minimum should require references
> to the rows to check against?
>
> Is PostgreSQL doing something beyond my imagination? :-)

Not too far beyond your imagination, I hope.

It's simply your assumption that the hash table is lossy. Sure, hash
values are lossy, but a hash table isn't. Postgres stores in memory not
only the hash values, but the rows they refer to as well, having checked
them all on disc beforehand. That way, it doesn't need to look up anything
on disc for that branch of the join again, and it has a rapid in-memory
lookup for each row.

Matthew

--
X's book explains this very well, but, poor bloke, he did the Cambridge Maths
Tripos... -- Computer Science Lecturer

In response to

Re: Group by more efficient than distinct? at 2008-04-21 23:50:22 from Mark Mielke

Responses

Re: Group by more efficient than distinct? at 2008-04-22 12:01:20 from Mark Mielke

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Mark Mielke	2008-04-22 12:01:20	Re: Group by more efficient than distinct?
Previous Message	PFC	2008-04-22 10:29:29	Re: Oddly slow queries