Re: HashJoin order, hash the large or small table? Postgres likes to hash the big one, why?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Scott Carey <scott(at)richrelevance(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org Performance" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: HashJoin order, hash the large or small table? Postgres likes to hash the big one, why?
Date: 2010-10-19 03:43:45
Message-ID: 1532.1287459825@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Scott Carey <scott(at)richrelevance(dot)com> writes:
> I consistently see HashJoin plans that hash the large table, and scan
> the small table.

Could we see a self-contained test case? And what cost parameters are
you using, especially work_mem?

> This is especially puzzling in some cases where I have 30M rows in the big table and ~ 100 in the small... shouldn't it hash the small table and scan the big one?

Well, size of the table isn't the only factor; in particular, a highly
nonuniform distribution of the key value will inflate the cost estimate
for using a table on the inner size of the hash. But the example you
show here seems a bit extreme.

regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message AI Rumman 2010-10-19 07:18:45 Re: how to get the total number of records in report
Previous Message Jon Nelson 2010-10-19 02:47:08 Re: unexpected query failure: ERROR: GIN indexes do not support whole-index scans