Quick Links

Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Bryce Cutt <pandasuit(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Joshua Tolley <eggyknap(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets
Date:	2009-03-21 00:51:38
Message-ID:	603c8f070903201751h1e81ce4cvc11bd20396d016a2@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Mar 20, 2009 at 8:45 PM, Bryce Cutt <pandasuit(at)gmail(dot)com> wrote:
> On Fri, Mar 20, 2009 at 5:35 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> If the inner relation isn't fairly close to unique you shouldn't be
>> using this optimization in the first place.
> Not necessarily true. Seeing as (when the statistics are correct) we
> know each of these inner tuples will match with the largest amount of
> outer tuples it is just as much of a win per inner tuple as when they
> are unique. There is just a chance you will have to give up on the
> optimization part way through if too many inner tuples fall into the
> new "skew buckets" (formerly IM buckets) and dump the tuples back into
> the main buckets. The potential win is still pretty high though.
>
> - Bryce Cutt

Maybe I'm remembering wrong, but I thought the estimating functions
assuemd that the inner relation was unique. So if there turn out to
be 2, 3, 4, or more copies of each value, the chances of blowing out
the skew hash table are almost 100%, I would think... am I wrong?

...Robert

In response to

Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets at 2009-03-21 00:45:14 from Bryce Cutt

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andrew Gierth	2009-03-21 01:57:05	contrib function naming, and upgrade issues
Previous Message	Robert Haas	2009-03-21 00:51:00	Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets