Re: HashJoin w/option to unique-ify inner rel

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Subject: Re: HashJoin w/option to unique-ify inner rel
Date: 2009-04-17 00:28:43
Message-ID: 603c8f070904161728p60da50a0o4833fbe9ba1a4f94@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 16, 2009 at 7:26 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> Upon further review, it appears that a big part of this problem is
>> that cost_hashjoin() doesn't understand that it needs cost semi-joins
>> differently from inner or left joins.
>
> Yeah, I have a note to look into that before 8.4 final.  The same is
> true for nestloops: stopping after hitting one match to the current
> outer can make a big difference, and that's not reflected in costsize.c
> yet.  I'm not sure whether cost_mergejoin needs to care.

Cool. It's worth noting that this is also a problem for anti-joins,
which can also stop after hitting one match.

For merge join, it looks like you might save something if there are
duplicates in both the inner and outer lists. If the outer side
contains 1,1,1,1,1,2 and the inner side contains 1,1,1,2,2, then an
INNER JOIN or LEFT JOIN will rescan the first three tuples of the
inner side three times, whereas a SEMI or ANTI JOIN will scan the
first tuple three times and then the next two only once. But I'm not
sure if we can estimate this accurately enough to matter.

...Robert

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kris Jurka 2009-04-17 01:02:04 Re: No hash join across partitioned tables?
Previous Message Kris Jurka 2009-04-16 23:30:51 Re: No hash join across partitioned tables?