pgsql: Change the implementation of hash join to attempt to avoid

From: neilc(at)svr1(dot)postgresql(dot)org (Neil Conway)
To: pgsql-committers(at)postgresql(dot)org
Subject: pgsql: Change the implementation of hash join to attempt to avoid
Date: 2005-06-15 07:27:45
Message-ID: 20050615072745.1E7CC52828@svr1.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Log Message:
-----------
Change the implementation of hash join to attempt to avoid unnecessary
work if either of the join relations are empty. The logic is:

(1) if the inner relation's startup cost is less than the outer
relation's startup cost and this is not an outer join, read
a single tuple from the inner relation via ExecHash()
- if NULL, we're done

(2) read a single tuple from the outer relation
- if NULL, we're done

(3) build the hash table on the inner relation
- if hash table is empty and this is not an outer join,
we're done

(4) otherwise, do hash join as usual

The implementation uses the new MultiExecProcNode API, per a
suggestion from Tom: invoking ExecHash() now produces the first
tuple from the Hash node's child node, whereas MultiExecHash()
builds the hash table.

I had to put in a bit of a kludge to get the row count returned
for EXPLAIN ANALYZE to be correct: since ExecHash() is invoked to
return a tuple, and then MultiExecHash() is invoked, we would
return one too many tuples to EXPLAIN ANALYZE. I hacked around
this by just manually detecting this situation and subtracting 1
from the EXPLAIN ANALYZE row count.

Modified Files:
--------------
pgsql/src/backend/executor:
nodeHash.c (r1.93 -> r1.94)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/executor/nodeHash.c.diff?r1=1.93&r2=1.94)
nodeHashjoin.c (r1.71 -> r1.72)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/executor/nodeHashjoin.c.diff?r1=1.71&r2=1.72)
pgsql/src/include/nodes:
execnodes.h (r1.133 -> r1.134)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/include/nodes/execnodes.h.diff?r1=1.133&r2=1.134)

Browse pgsql-committers by date

  From Date Subject
Next Message Bruce Momjian 2005-06-15 12:52:20 Re: pgsql: Add pg_postmaster_start_time() function.
Previous Message Neil Conway 2005-06-15 06:29:25 pgsql: Minor SGML markup cleanup.