Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca>
Cc: Bryce Cutt <pandasuit(at)gmail(dot)com>, Joshua Tolley <eggyknap(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets
Date: 2009-02-25 03:18:02
Message-ID: 603c8f070902241918k5274a862ua8b206db145912af@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Joshua sent us some preliminary data with this query and others and indicated that we could post it.  He wanted time to clean it up
> and re-run some experiments, but the data is generally good and the algorithm performs as expected.  I have attached this data to the
> post.  Note that the last set of data (although labelled as Z7) is actually an almost zero skew database and represents the worst-case
> for the algorithm (for most queries the optimization is not even used).

Sadly, there seem to be a number of cases in the Z7 database where the
optimization makes things significantly worse (specifically, queries
2, 3, and 7, but especially query 3). Have you investigated what is
going on there? I had thought that we had sufficient safeguards in
place to prevent this optimization from kicking in in cases where it
doesn't help, but it seems not. There will certainly be real-world
databases that are more like Z7 than Z1.

...Robert

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2009-02-25 03:38:18 Re: Service not starting: Error 1053
Previous Message Frank Featherlight 2009-02-25 03:08:36 Re: Service not starting: Error 1053