Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets

From: Bryce Cutt <pandasuit(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Joshua Tolley <eggyknap(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets
Date: 2009-02-26 20:16:42
Message-ID: 1924d1180902261216t8237875t818f44280bf5e99f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The patch originally modified the cost function but I removed that
part before we submitted it to be a bit conservative about our
proposed changes. I didn't like that for large plans the statistics
were retrieved and calculated many times when finding the optimal
query plan.

The overhead of the algorithm when the skew optimization is not used
ends up being roughly a function call and an if statement per tuple.
It would be easy to remove the function call per tuple. Dr. Lawrence
has come up with some changes so that when the optimization is turned
off, the function call does not happen at all and instead of the if
statement happening per tuple it is run just once per join. We have
to test this a bit more but it should further reduce the overhead.

Hopefully we will have the new patch ready to go this weekend.

- Bryce Cutt

On Thu, Feb 26, 2009 at 7:45 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Heikki's got a point here: the planner is aware that hashjoin doesn't
> like skewed distributions, and it assigns extra cost accordingly if it
> can determine that the join key is skewed.  (See the "bucketsize" stuff
> in cost_hashjoin.)  If this patch is accepted we'll want to tweak that
> code.
>
> Still, that has little to do with the current gating issue, which is
> whether we've convinced ourselves that the patch doesn't cause a
> performance decrease for cases in which it's unable to help.
>
>                        regards, tom lane
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2009-02-26 20:18:45 Re: xpath processing brain dead
Previous Message Heikki Linnakangas 2009-02-26 19:59:05 Re: Hot standby, recovery infra