Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets

From: Joshua Tolley <eggyknap(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Bryce Cutt <pandasuit(at)gmail(dot)com>, "Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
Date: 2008-12-23 18:28:19
Message-ID: 20081223182818.GA5867@uber
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 23, 2008 at 10:14:29AM -0500, Robert Haas wrote:
> > It's equivalent to our assumption that distributions of values in
> > columns in the same table are independent. Making that assumption in
> > this case would probably result in occasional dramatic speed
> > improvements similar to the ones we've seen in less complex joins,
> > offset by just-as-occasional dramatic slowdowns of similar magnitude. In
> > other words, it will increase the variance of our results.
>
> Under what circumstances do you think that it would produce a dramatic
> slowdown? I'm confused. I thought the penalty for picking a bad set
> of values for the in-memory hash table was pretty small.
>
> ...Robert

I take that back :) I agree with what others have already said, that it
shouldn't cause dramatic slowdowns when we get it wrong.

- Josh

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2008-12-23 18:34:41 Re: Lock conflict behavior?
Previous Message Lawrence, Ramon 2008-12-23 18:12:22 Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets