Re: New style of hash join proposal

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-hackers list" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: New style of hash join proposal
Date: 2008-03-18 03:41:01
Message-ID: 87myowaecy.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Gregory Stark <stark(at)enterprisedb(dot)com> writes:
>> "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>>> Nested Loop (cost=5.39..198.81 rows=51 width=244)
>>> -> HashAggregate (cost=1.06..1.11 rows=5 width=4)
>>> -> Seq Scan on int4_tbl b (cost=0.00..1.05 rows=5 width=4)
>>> -> Bitmap Heap Scan on tenk1 a (cost=4.33..39.41 rows=10 width=244)
>>> Recheck Cond: (a.thousand = b.f1)
>>> -> Bitmap Index Scan on tenk1_thous_tenthous (cost=0.00..4.33 rows=10 width=0)
>>> Index Cond: (a.thousand = b.f1)
>>> (7 rows)
>
>> Sure, but that's still re-executing the bitmap index scan 51 times -- possibly
>> having to fetch the same records off disk repeatedly.

sorry 5 times

> It's not fetching any record repeatedly, because the HashAggregate
> step eliminated duplicate keys on the other side.

Only because it's an IN query. If it was a normal join which hash joins are
perfectly capable of handling then it would be. As it is it could be fetching
the same page repeatedly but not precisely the same tuples.

It happens that transforming this query to

explain analyze select * from tenk1 a where thousand = any (array(select f1 from int4_tbl b));

makes it run about 40% faster. That could just be that the hash is small
enough that a linear search is more efficient than calling hashint4 though.
(Perhaps we should be protecting against that in dynahash, actually)

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2008-03-18 03:59:45 pgsql: Add URLs for : * Speed WAL recovery by allowing more than one
Previous Message KaiGai Kohei 2008-03-18 03:16:27 Re: [0/4] Proposal of SE-PostgreSQL patches