Re: HashJoin w/option to unique-ify inner rel

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Greg Stark <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Subject: Re: HashJoin w/option to unique-ify inner rel
Date: 2009-05-09 23:00:37
Message-ID: 19622.1241910037@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> ... So it appears to me that instead of taking an average-case correction
> as is done in this patch and the old coding, we have to explicitly model
> the matched-tuple and unmatched-tuple cases separately.

I've applied the attached patch that does things this way. I did not do
anything about improving the detailed modeling of hash-bucket searching
as Robert suggested in some later messages. I think that's probably
worth looking at, but it's a second-order consideration --- this patch
already seems to bring the estimates for semi/antijoins much closer
to reality.

I am a bit concerned about the extra time spent on repeated selectivity
estimates. It might not matter too much since it's only done for semi
and anti joins which aren't that common. It would be good though if
someone who has a lot of such joins could test CVS HEAD and see if
performance has gotten worse (Kevin?). We could refactor things to
reduce the duplication of effort but I'd prefer to leave that sort of
thing to 8.5.

BTW, if you're reading the patch in detail, the changes outside
costsize.c are just refactoring to allow costsize.c to use some
code that was formerly buried in createplan.c. The changes in
costsize.c use the same basic selectivity calculation as in my
patch of two weeks ago, but apply the results differently as per
our discussion.

regards, tom lane

Attachment Content-Type Size
unknown_filename text/plain 25.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2009-05-10 00:05:36 Re: pg_migrator alpha 5 - truncates at 10 M rows
Previous Message David Fetter 2009-05-09 20:41:28 Re: strict version of version_stamp.pl