Skip site navigation (1) Skip section navigation (2)

Re: [PERFORM] 7.4 vs 7.3 ( hash join issue )

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Greg Stark <gsstark(at)MIT(dot)EDU>
Cc: Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org>, Greg Stark <gsstark(at)mit(dot)edu>,Gaetano Mendola <mendola(at)bigfoot(dot)com>, <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [PERFORM] 7.4 vs 7.3 ( hash join issue )
Date: 2004-09-22 17:56:13
Message-ID: 87mzzichmq.fsf@stark.xeocode.com (view raw or flat)
Thread:
Lists: pgsql-patchespgsql-performance
Greg Stark <gsstark(at)MIT(dot)EDU> writes:

> Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org> writes:
> 
> > On 22 Sep 2004, Greg Stark wrote:
> > 
> > > Actually this looks like it's arguably a bug to me. Why does the hash
> > > join execute the sequential scan at all? Shouldn't it also like the
> > > merge join recognize that the other hashed relation is empty and skip
> > > the sequential scan entirely?
> > 
> > I'm not sure you can classify that as a bug. It's just that he in one of 
> > the plans started with the empty scan and bacause of that didn't need 
> > the other, but with the hash join it started with the table that had 16 
> > rows and then got to the empty one.
> 
> No, postgres didn't do things in reverse order. It hashed the empty table and
> then went ahead and checked every record of the non-empty table against the
> empty hash table.

Alright, attached is a simple patch that changes this. I don't really know
enough of the overall code to be sure this is safe. But from what I see of the
hash join code it never returns any rows unless there's a match except for
outer joins. So I think it should be safe.

test=# create table a (a integer);
CREATE TABLE
test=# create table b (a integer);
CREATE TABLE
test=# set enable_mergejoin = off;
SET
test=# explain analyze select * from a natural join b;
                                                QUERY PLAN                                                 
-----------------------------------------------------------------------------------------------------------
 Hash Join  (cost=22.50..345.00 rows=5000 width=4) (actual time=0.022..0.022 rows=0 loops=1)
   Hash Cond: ("outer".a = "inner".a)
   ->  Seq Scan on a  (cost=0.00..20.00 rows=1000 width=4) (never executed)
   ->  Hash  (cost=20.00..20.00 rows=1000 width=4) (actual time=0.005..0.005 rows=0 loops=1)
         ->  Seq Scan on b  (cost=0.00..20.00 rows=1000 width=4) (actual time=0.002..0.002 rows=0 loops=1)
 Total runtime: 0.089 ms
(6 rows)

By comparison, note the sequential scan doesn't show "never executed" on 7.4.3
(sorry, I didn't think to run the query against 8.0 before I compiled the
patched version):

                                                QUERY PLAN                                                 
-----------------------------------------------------------------------------------------------------------
 Hash Join  (cost=22.50..345.00 rows=5000 width=4) (actual time=0.881..0.881 rows=0 loops=1)
   Hash Cond: ("outer".a = "inner".a)
   ->  Seq Scan on a  (cost=0.00..20.00 rows=1000 width=4) (actual time=0.001..0.001 rows=0 loops=1)
   ->  Hash  (cost=20.00..20.00 rows=1000 width=4) (actual time=0.008..0.008 rows=0 loops=1)
         ->  Seq Scan on b  (cost=0.00..20.00 rows=1000 width=4) (actual time=0.004..0.004 rows=0 loops=1)
 Total runtime: 1.105 ms
(6 rows)

In response to

pgsql-performance by date

Next:From: Tom LaneDate: 2004-09-22 17:56:34
Subject: Re: 7.4 vs 7.3 ( hash join issue )
Previous:From: Greg StarkDate: 2004-09-22 17:38:00
Subject: Re: 7.4 vs 7.3 ( hash join issue )

pgsql-patches by date

Next:From: Tom LaneDate: 2004-09-22 17:56:34
Subject: Re: 7.4 vs 7.3 ( hash join issue )
Previous:From: Greg StarkDate: 2004-09-22 17:38:00
Subject: Re: 7.4 vs 7.3 ( hash join issue )

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group