Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets

From: Joshua Tolley <eggyknap(at)gmail(dot)com>
To: Bryce Cutt <pandasuit(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
Date: 2008-11-10 08:57:57
Message-ID: 20081110085757.GB10915@uber
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 05, 2008 at 04:06:11PM -0800, Bryce Cutt wrote:
> The error is causes by me Asserting against the wrong variable. I
> never noticed this as I apparently did not have assertions turned on
> on my development machine. That is fixed now and with the new patch
> version I have attached all assertions are passing with your query and
> my test queries. I added another assertion to that section of the
> code so that it is a bit more vigorous in confirming the hash table
> partition is correct. It does not change the operation of the code.
>
> There are two partition counts. One holds the maximum number of
> buckets in the hash table and the other counts the number of actual
> buckets created for hash values. I was incorrectly testing against
> the second one because that was valid before I started using a hash
> table to store the buckets.
>
> The enable_hashjoin_usestatmcvs flag was valuable for my own research
> and tests and likely useful for your review but Tom is correct that it
> can be removed in the final version.
>
> - Bryce Cutt
>

Well, this version seems to work as advertised. Skewed data sets tend to
hash join more quickly with this turned on, and data sets with
deliberately bad statistics don't perform much differently than with the
feature turned off. The patch applies cleanly to CVS HEAD.

I don't consider myself qualified to do a decent code review. However I
noticed that the comments are all done with // instead of /* ... */.
That should probably be changed.

To those familiar with code review: is there more I should do to review
this?

- Josh / eggyknap

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message ITAGAKI Takahiro 2008-11-10 09:02:41 Re: auto_explain contrib moudle
Previous Message Zdenek Kotala 2008-11-10 08:21:48 Re: WIP: Page space reservation (pgupgrade)