Re: HashJoin w/option to unique-ify inner rel

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Subject: Re: HashJoin w/option to unique-ify inner rel
Date: 2009-04-16 02:54:16
Message-ID: 603c8f070904151954v716a809dm1510da5a8a35154d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Apr 12, 2009 at 12:00 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Feb 19, 2009 at 5:53 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Greg Stark <stark(at)enterprisedb(dot)com> writes:
>>> It's tempting to have Hash cheat and just peek at the node beneath it
>>> to see if it's a HashAggregate, in which case it could call a special
>>> method to request the whole hash. But it would have to know that it's
>>> just a plain uniquify and not implementing a GROUP BY.
>>
>> More to the point, it would have to check if it's unique-ifying on the
>> same columns and with the same operators as the upper hash is using.
>> If we were going to do something like this, making it a real option to
>> the Hash node and teaching the planner about that would be *much*
>> easier, and would also allow saner cost estimation.
>
> I've been thinking about this some more and would like someone else to
> check my work.
>
> The goal here is to handle a semi-join implemented as a hash join by
> unique-ifying the inner rel as we are building the hash table, rather
> than as a separate step. [long discussion of implementation methodology]
>
> Does anyone see a hole in this logic?

Upon further review, it appears that a big part of this problem is
that cost_hashjoin() doesn't understand that it needs cost semi-joins
differently from inner or left joins. The bogus logic looks to be
right here:

/*
* The number of tuple comparisons needed is the number of outer tuples
* times the typical number of tuples in a hash bucket, which is the inner
* relation size times its bucketsize fraction. At each one, we need to
* evaluate the hashjoin quals. But actually, charging the full qual eval
* cost at each tuple is pessimistic, since we don't evaluate the quals
* unless the hash values match exactly. For lack of a better idea, halve
* the cost estimate to allow for that.
*/
startup_cost += hash_qual_cost.startup;
run_cost += hash_qual_cost.per_tuple *
outer_path_rows * clamp_row_est(inner_path_rows *
innerbucketsize) * 0.5;

Of course, when the join type is JOIN_SEMI, we're going to stop
looking after we find the first match, so this estimate is really far
off.

rhaas=# explain analyze select * from a where i in (select b.i from b);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=174.72..2095.53 rows=10 width=4) (actual
time=33.274..285.627 rows=10 loops=1)
Hash Cond: (a.i = b.i)
-> Seq Scan on a (cost=0.00..1443.00 rows=100000 width=4) (actual
time=0.023..125.307 rows=100000 loops=1)
-> Hash (cost=174.60..174.60 rows=10 width=4) (actual
time=33.209..33.209 rows=10 loops=1)
-> HashAggregate (cost=174.50..174.60 rows=10 width=4)
(actual time=33.161..33.175 rows=10 loops=1)
-> Seq Scan on b (cost=0.00..148.80 rows=10280
width=4) (actual time=0.015..15.304 rows=10280 loops=1)
Total runtime: 285.743 ms
(7 rows)
rhaas=# set enable_hashagg to false; set enable_sort to false;
SET
SET
rhaas=# explain analyze select * from a where i in (select b.i from b);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Hash Semi Join (cost=277.30..130573.10 rows=10 width=4) (actual
time=31.447..292.823 rows=10 loops=1)
Hash Cond: (a.i = b.i)
-> Seq Scan on a (cost=0.00..1443.00 rows=100000 width=4) (actual
time=0.022..125.300 rows=100000 loops=1)
-> Hash (cost=148.80..148.80 rows=10280 width=4) (actual
time=31.392..31.392 rows=10280 loops=1)
-> Seq Scan on b (cost=0.00..148.80 rows=10280 width=4)
(actual time=0.014..14.958 rows=10280 loops=1)
Total runtime: 293.154 ms
(6 rows)

The planner costs the semi-join as two orders of magnitude more
expensive than the hash-join-over-hash-aggregate, but in reality the
semi join is only marginally slower. The planner thinks we're going
to end up wasting a lot of time walking down long hash chains and it's
just not true. b contains lots of duplicates but they all hash to the
same buckets, and when an a-value hashes to that bucket it's probably
because we've got a match, and because it's a semi-join, finding one
match is a sufficient excuse to skip the rest of the bucket..

...Robert

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message James Robinson 2009-04-16 02:54:24 NOTIFY / LISTEN silently parses and discards schema-ish portion of notification name ...
Previous Message David Fetter 2009-04-16 02:41:42 Re: psql with "Function Type" in \df