Skip site navigation (1) Skip section navigation (2)

Re: HashJoin order, hash the large or small table? Postgres likes to hash the big one, why?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Scott Carey <scott(at)richrelevance(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org Performance" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: HashJoin order, hash the large or small table? Postgres likes to hash the big one, why?
Date: 2010-10-27 19:56:19
Message-ID: 26391.1288209379@sss.pgh.pa.us (view raw or flat)
Thread:
Lists: pgsql-performance
Scott Carey <scott(at)richrelevance(dot)com> writes:
> Why does hashjoin behave poorly when the inner relation is not
> uniformly distributed and the outer is?

Because a poorly distributed inner relation leads to long hash chains.
In the very worst case, all the keys are on the same hash chain and it
degenerates to a nested-loop join.  (There is an assumption in the
costing code that the longer hash chains also tend to get searched more
often, which maybe doesn't apply if the outer rel is flat, but it's not
obvious that it's safe to not assume that.)

> In particular for anti-join this should be the best case scenario.

Not really.  It's still searching a long hash chain; maybe it will find
an exact match early in the chain, or maybe not.  It's certainly not
*better* than antijoin with a well-distributed inner rel.  Although the
point is moot, anyway, since if it's an antijoin there is only one
candidate for which rel to put on the outside.

			regards, tom lane

In response to

pgsql-performance by date

Next:From: Greg SmithDate: 2010-10-27 19:58:46
Subject: Re: CPUs for new databases
Previous:From: Pierre CDate: 2010-10-27 19:52:49
Subject: Re: Select count(*), the sequel

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group