Re: Hash vs. HashJoin nodes

From: Neil Conway <neilc(at)samurai(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hash vs. HashJoin nodes
Date: 2005-03-31 04:53:51
Message-ID: 424B825F.7050904@samurai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> One small objection is that we'd lose the ability to separately display
> the time spent building the hash table in EXPLAIN ANALYZE output. It's
> probably not super important, but might be a reason to keep two plan
> nodes in the tree.

Hmm, true. Perhaps then just hacking the hash node so that hash join
pulls on it twice (the first time for a single tuple, the second time
for the rest) is the way to go. Since the hash node is essentially an
implementation detail of hash join, I don't feel _too_ bad about
dirtying up its API a bit...

> I recall having looked at related ideas (not this one exactly) and being
> discouraged by the fact that pulling a tuple from *either* input first
> is demonstrably a losing strategy, since either input might have a very
> high startup cost.

That is true, but I think this particular formulation avoids that
problem. If we look at the inner input first and find it is non-null, we
will *always* have to pull on the outer input at least once. The
question is merely whether we go to the trouble of building the hash
table before or after we do the initial pull on the outer relation. IOW,
I think this tweak would be universally better than the existing code.

> This could all get pretty hairy when you consider that it has to still
> work for left joins too ...

Right; I was planning to bail and only do this for inner joins.

-Neil

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2005-03-31 05:03:37 Re: Hash vs. HashJoin nodes
Previous Message Mark Kirkwood 2005-03-31 04:52:27 Re: [HACKERS] contrib/pg_buffercache