Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Joshua Tolley <eggyknap(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca>, Bryce Cutt <pandasuit(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets
Date: 2009-02-26 15:45:35
Message-ID: 17334.1235663135@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Heikki's got a point here: the planner is aware that hashjoin doesn't
like skewed distributions, and it assigns extra cost accordingly if it
can determine that the join key is skewed. (See the "bucketsize" stuff
in cost_hashjoin.) If this patch is accepted we'll want to tweak that
code.

Still, that has little to do with the current gating issue, which is
whether we've convinced ourselves that the patch doesn't cause a
performance decrease for cases in which it's unable to help.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2009-02-26 16:02:34 Re: Synchronous replication & Hot standby patches
Previous Message Tom Lane 2009-02-26 15:27:12 Re: Have \d show child tables that inherit from the specified parent