Re: HashJoin w/option to unique-ify inner rel

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Subject: Re: HashJoin w/option to unique-ify inner rel
Date: 2009-05-10 03:05:47
Message-ID: 603c8f070905092005i9e40572o1f217aba9a2c2c13@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, May 9, 2009 at 7:00 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I wrote:
>> ... So it appears to me that instead of taking an average-case correction
>> as is done in this patch and the old coding, we have to explicitly model
>> the matched-tuple and unmatched-tuple cases separately.
>
> I've applied the attached patch that does things this way.  I did not do
> anything about improving the detailed modeling of hash-bucket searching
> as Robert suggested in some later messages.  I think that's probably
> worth looking at, but it's a second-order consideration --- this patch
> already seems to bring the estimates for semi/antijoins much closer
> to reality.

I'll take a look at this when I get a chance, but I'm just playing
with test cases, so I share your hope that Kevin (or someone else with
complex queries against real data) will test it out.

> I am a bit concerned about the extra time spent on repeated selectivity
> estimates.  It might not matter too much since it's only done for semi
> and anti joins which aren't that common.  It would be good though if
> someone who has a lot of such joins could test CVS HEAD and see if
> performance has gotten worse (Kevin?).  We could refactor things to
> reduce the duplication of effort but I'd prefer to leave that sort of
> thing to 8.5.

Agreed. I was worried about that when I wrote the emails to which you
refer above, but I don't know how else to get good estimates for all
the relevant cases.

...Robert

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-05-10 03:11:33 Re: pg_migrator alpha 5 - truncates at 10 M rows
Previous Message Erik Rijkers 2009-05-10 03:01:46 Re: pg_migrator alpha 5 - truncates at 10 M rows