hash join: probe both inputs first

From: Neil Conway <neilc(at)samurai(dot)com>
To: pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: hash join: probe both inputs first
Date: 2005-06-14 04:39:47
Message-ID: 42AE5F93.3060403@samurai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches

Per earlier discussion on pgsql-hackers[1], this patch changes the
implementation of hash join to attempt to avoid redundant work if either
of the join relations are empty. The logic is:

(1) if the inner relation's startup cost is less than the outer
relation's startup cost and this is not an outer join, read a single
tuple from the inner relation via ExecHash()
- if NULL, we're done
(2) read a single tuple from the outer relation
- if NULL, we're done
(3) build the hash table on the inner relation
- if hash table is empty and this is not an outer join, we're done
(4) otherwise, do hash join as usual

The existing hash join code will avoid scanning the outer relation if
the inner relation is empty, but it doesn't try to avoid scanning the
outer relation.

The implementation uses the new MultiExecProcNode API, per a suggestion
from Tom: invoking ExecHash() now produces the first tuple from the Hash
node's child node, whereas MultiExecHash() builds the hash table.

I had to put in a bit of a kludge to get the row count returned for
EXPLAIN ANALYZE correct: since ExecHash() is invoked to return a tuple,
and then MultiExecHash() is invoked, we would return one too many tuples
to EXPLAIN ANALYZE. So the current patch hacks around that by just
checking for this situation and subtracting 1 from the row count.
Suggestions for a cleaner approach would be welcome.

-Neil

[1] http://archives.postgresql.org/pgsql-hackers/2005-03/msg01040.php

Attachment Content-Type Size
hash_join_probe_both_inputs-4.patch text/x-patch 12.7 KB

Responses

Browse pgsql-patches by date

  From Date Subject
Next Message Neil Conway 2005-06-14 04:44:42 Re: plpgsql raise - parameters can be expressions
Previous Message Bruce Momjian 2005-06-14 03:08:17 Re: BUG #1707: statistics collector starts with stats_start_collector