Re: RIGHT/FULL OUTER hash joins (was Re: small table left outer join big table)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jie Li <jay23jack(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: RIGHT/FULL OUTER hash joins (was Re: small table left outer join big table)
Date: 2010-12-30 15:50:16
Message-ID: AANLkTikcuSc9p1XDmN3ZE5YDtWvFF+1MU-LCjEVRVyBD@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 30, 2010 at 10:45 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I had an epiphany about this topic, or actually two of them.
>
> 1. Whether or not you think there's a significant performance reason
> to support hash right joins, there's a functionality reason.  The
> infrastructure for right join could just as easily do full joins.
> And AFAICS, a hash full join would only require one hashable join
> clause --- the other FULL JOIN ON conditions could be anything at all.
> This is unlike the situation for merge join, where all the JOIN ON
> conditions have to be mergeable or it doesn't work right.  So we could
> greatly reduce the scope of the dreaded "FULL JOIN is only supported
> with merge-joinable join conditions" error.  (Well, okay, it's not
> *that* dreaded, but people complain about it occasionally.)

Yeah, that would be neat. It might be a lot faster in some cases, too.

> 2. The obvious way to implement this would involve adding an extra bool
> field to struct HashJoinTupleData.  The difficulty with that, and the
> reason I'd been resistant to the whole idea, is that it'd eat up a full
> word per hashtable entry because of alignment considerations.  (On
> 64-bit machines it'd be free because of alignment considerations, but
> that's cold comfort when 32-bit machines are the ones pressed for
> address space.)  But we only need one bit, so what about commandeering
> an infomask bit in the tuple itself?  For the initial implementation
> I'd be inclined to take one of the free bits in t_infomask2.  We could
> actually get away with overlaying the flag bit with one of the tuple
> visibility bits, since it will only be used in tuples that are in the
> in-memory hash table, which don't need visibility info anymore.  But
> that seems like a kluge that could wait until we really need the flag
> space.

I think that's a reasonable approach, although I might be inclined to
do the overlay sooner rather than later if it doesn't add too much
complexity.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2010-12-30 15:53:24 Re: Streaming replication as a separate permissions
Previous Message Robert Haas 2010-12-30 15:47:42 Re: and it's not a bunny rabbit, either