Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets

From: "Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca>
To: "Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc: "Bryce Cutt" <pandasuit(at)gmail(dot)com>, "Joshua Tolley" <eggyknap(at)gmail(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets
Date: 2009-02-19 16:37:05
Message-ID: 6EEA43D22289484890D119821101B1DF28B35F@exchange20.mercury.ad.ubc.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

________________________________

From: pgsql-hackers-owner(at)postgresql(dot)org on behalf of Robert Haas
I think what we need here is some very simple testing to demonstrate
that this patch demonstrates a speed-up even when the inner side of
the join is a joinrel rather than a baserel. Can you suggest a single
query against the skewed TPCH dataset that will result in two or more
multi-batch hash joins? If so, it should be a simple matter to run
that query with and without the patch and verify that the former is
faster than the latter.

This query will have the outer relation be a joinrel rather than a baserel:

select count(*) from supplier, part, lineitem where l_partkey = p_partkey and s_suppkey = l_suppkey;

The approach collects statistics on the outer relation (not the inner relation) so the code had to have the ability to determine a stats tuple on a joinrel in addition to a baserel.

Joshua sent us some preliminary data with this query and others and indicated that we could post it. He wanted time to clean it up and re-run some experiments, but the data is generally good and the algorithm performs as expected. I have attached this data to the post. Note that the last set of data (although labelled as Z7) is actually an almost zero skew database and represents the worst-case for the algorithm (for most queries the optimization is not even used).

--
Ramon Lawrence

Attachment Content-Type Size
JoshuaTolleyData.xls application/vnd.ms-excel 26.5 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zeugswetter Andreas OSB sIT 2009-02-19 17:11:56 Re: vacuumdb --freeze
Previous Message Andrew Chernow 2009-02-19 16:36:20 Re: PQinitSSL broken in some use casesf