Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets

From: "Robert Haas" <robertmhaas(at)gmail(dot)com>
To: "Bryce Cutt" <pandasuit(at)gmail(dot)com>
Cc: "Joshua Tolley" <eggyknap(at)gmail(dot)com>, "Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
Date: 2008-12-23 14:22:27
Message-ID: 603c8f070812230622i57150a8ewa41ac8355604a88a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 23, 2008 at 2:21 AM, Bryce Cutt <pandasuit(at)gmail(dot)com> wrote:
> Because there is no nice way in PostgreSQL (that I know of) to derive
> a histogram after a join (on an intermediate result) currently
> usingMostCommonValues is only enabled on a join when the outer (probe)
> side is a table scan (seq scan only actually). See
> getMostCommonValues (soon to be called
> ExecHashJoinGetMostCommonValues) for the logic that determines this.

It's starting to seem to me that the case where this patch provides a
benefit is so narrow that I'm not sure it's worth the extra code.
Admittedly, when it works, it is pretty dramatic, as in the numbers
that I posted previously. I'm OK with the fact that it is restricted
to hash joins on a single variable where the probe relation is a
sequential scan, because that actually happens pretty frequently, at
least in my queries. But, if there's no way to consistently get any
benefit out of this when joining more than two tables, then I'm not
sure it's worth it.

Is it realistic to think that the MCVs of the base relation might
still be applicable to the joinrel? It's certainly easy to think of
counterexamples, but it might be a good approximation more often than
not.

...Robert

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2008-12-23 14:31:56 Re: Sync Rep: First Thoughts on Code
Previous Message Tom Lane 2008-12-23 14:12:45 Re: encoding cleanups in cvs repo