Skip site navigation (1) Skip section navigation (2)

Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets

From: Bryce Cutt <pandasuit(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Joshua Tolley <eggyknap(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets
Date: 2009-02-26 20:16:42
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers
The patch originally modified the cost function but I removed that
part before we submitted it to be a bit conservative about our
proposed changes.  I didn't like that for large plans the statistics
were retrieved and calculated many times when finding the optimal
query plan.

The overhead of the algorithm when the skew optimization is not used
ends up being roughly a function call and an if statement per tuple.
It would be easy to remove the function call per tuple.  Dr. Lawrence
has come up with some changes so that when the optimization is turned
off, the function call does not happen at all and instead of the if
statement happening per tuple it is run just once per join.  We have
to test this a bit more but it should further reduce the overhead.

Hopefully we will have the new patch ready to go this weekend.

- Bryce Cutt

On Thu, Feb 26, 2009 at 7:45 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Heikki's got a point here: the planner is aware that hashjoin doesn't
> like skewed distributions, and it assigns extra cost accordingly if it
> can determine that the join key is skewed.  (See the "bucketsize" stuff
> in cost_hashjoin.)  If this patch is accepted we'll want to tweak that
> code.
> Still, that has little to do with the current gating issue, which is
> whether we've convinced ourselves that the patch doesn't cause a
> performance decrease for cases in which it's unable to help.
>                        regards, tom lane

In response to


pgsql-hackers by date

Next:From: Andrew DunstanDate: 2009-02-26 20:18:45
Subject: Re: xpath processing brain dead
Previous:From: Heikki LinnakangasDate: 2009-02-26 19:59:05
Subject: Re: Hot standby, recovery infra

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group