Re: HashJoin w/option to unique-ify inner rel

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Greg Stark <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Subject: Re: HashJoin w/option to unique-ify inner rel
Date: 2009-04-25 00:09:10
Message-ID: 21809.1240618150@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Upon further review, it appears that a big part of this problem is
> that cost_hashjoin() doesn't understand that it needs cost semi-joins
> differently from inner or left joins.
> ...
> The planner costs the semi-join as two orders of magnitude more
> expensive than the hash-join-over-hash-aggregate, but in reality the
> semi join is only marginally slower. The planner thinks we're going
> to end up wasting a lot of time walking down long hash chains and it's
> just not true. b contains lots of duplicates but they all hash to the
> same buckets, and when an a-value hashes to that bucket it's probably
> because we've got a match, and because it's a semi-join, finding one
> match is a sufficient excuse to skip the rest of the bucket..

I spent today poking at this problem, without much success. I had
thought that the solution was basically to re-introduce something
comparable to the "joininfactor" correction that was used in 8.3 and
previous branches. However, that attempt (attached for the archives)
crashed and burned miserably. In the attached patch, we estimate the
number of outer tuples that have some match (as opposed to no match at
all) and the average number of matches for each tuple that has some
match. These estimates seem to be pretty good, at least in the simple
test case I'm using. However, it then combines the numbers into a
single work-savings factor averaged across all outer tuples:

+ /*
+ * For an outer-rel row that has no match, the executor will have to scan
+ * the whole input relation to prove that. Remembering the meaning of
+ * jselec, we therefore estimate the average scan fraction over all
+ * outer-rel rows thus:
+ */
+ scanfrac = jselec * matchfrac + (1.0 - jselec) /* times 1.0 */ ;

and it turns out this is basically missing the point altogether. The
above comment is correct in the case where we're doing a nestloop with
simple seqscan on the inner relation: we're going to have to do the
maximum amount of work for any unmatched outer tuple. But in the
hashjoin case, as you noted, it's quite likely that an unmatched outer
will result in a probe into an empty hashbucket and thus expend minimum
rather than maximum work.

The case where we are doing a nestloop with inner indexscan (using the
join clause as index condition) also costs far less than this equation
suggests. A no-match outer tuple will result in an indexscan that
selects no tuples, not one that selects lots of tuples we then have to
ignore.

BTW, the pre-8.4 coding basically applied the equivalent of "matchfrac"
for all tuples, thus getting something like the right answer for matched
tuples but being entirely bogus for unmatched ones. However, in these
cases where unmatched tuples are cheap to process, this ended up being
a fairly reasonable estimate anyway. But it was completely wrong for
cases where unmatched tuples are expensive, which is why I'd tried to
rip it out (we have had complaints about such cases in the past).

So it appears to me that instead of taking an average-case correction
as is done in this patch and the old coding, we have to explicitly model
the matched-tuple and unmatched-tuple cases separately. For hashjoins,
what I think we should do is estimate the bucket scanning cost for
unmatched tuples assuming a perfectly uniform bucket distribution,
rather than the worst-case bucket size that's used in the current code
(and is a reasonable estimate for matched tuples). This is to reflect
the fact that an unmatched outer tuple might or might not hit a
populated bucket, and which bucket it hits is probably uncorrelated with
the distribution of the inner tuples.

The problem seems to be a lot messier to solve for nestloop joins.
We have to detect whether the inner path is an indexscan using the
semijoin clause as an index qual --- if it is not, then the correct
thing is to model unmatched tuples as expensive. If it is, then I think
we can take the cost of unmatched tuples as about equal to the cost of
retrieving a single row from an indexscan with matches (which we can get
by interpolating in the path's given costs). The problem here is that the
current representation of these path structures doesn't make it cheap to
determine which joinclauses are being used as index quals. Currently
that's okay because we only do it once at the end (in createplan.c) but
if we need the information during path searching it's got to be fairly
cheap to get. So some rejiggering of the Path representation seems
called for.

I wouldn't blink at doing this during development phase, but it's a
larger change than I'd really like to be making during beta. OTOH,
what we have got now is a definite regression from the pre-8.4 behavior,
especially in the cases where unmatched tuples are cheap. Maybe there's
not much choice but to fix it now.

Comments?

regards, tom lane

Attachment Content-Type Size
unknown_filename text/plain 12.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2009-04-25 02:37:25 Re: HashJoin w/option to unique-ify inner rel
Previous Message Grzegorz Jaskiewicz 2009-04-24 20:58:54 Re: GCC 4.4 compiler warnings