RIGHT/FULL OUTER hash joins (was Re: small table left outer join big table)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jie Li <jay23jack(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: RIGHT/FULL OUTER hash joins (was Re: small table left outer join big table)
Date: 2010-12-30 15:45:54
Message-ID: 16990.1293723954@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I had an epiphany about this topic, or actually two of them.

1. Whether or not you think there's a significant performance reason
to support hash right joins, there's a functionality reason. The
infrastructure for right join could just as easily do full joins.
And AFAICS, a hash full join would only require one hashable join
clause --- the other FULL JOIN ON conditions could be anything at all.
This is unlike the situation for merge join, where all the JOIN ON
conditions have to be mergeable or it doesn't work right. So we could
greatly reduce the scope of the dreaded "FULL JOIN is only supported
with merge-joinable join conditions" error. (Well, okay, it's not
*that* dreaded, but people complain about it occasionally.)

2. The obvious way to implement this would involve adding an extra bool
field to struct HashJoinTupleData. The difficulty with that, and the
reason I'd been resistant to the whole idea, is that it'd eat up a full
word per hashtable entry because of alignment considerations. (On
64-bit machines it'd be free because of alignment considerations, but
that's cold comfort when 32-bit machines are the ones pressed for
address space.) But we only need one bit, so what about commandeering
an infomask bit in the tuple itself? For the initial implementation
I'd be inclined to take one of the free bits in t_infomask2. We could
actually get away with overlaying the flag bit with one of the tuple
visibility bits, since it will only be used in tuples that are in the
in-memory hash table, which don't need visibility info anymore. But
that seems like a kluge that could wait until we really need the flag
space.

Comments?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-12-30 15:47:42 Re: and it's not a bunny rabbit, either
Previous Message Alvaro Herrera 2010-12-30 15:18:40 Re: SLRU API tweak