Re: 9.3 Pre-proposal: Range Merge Join

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: 9.3 Pre-proposal: Range Merge Join
Date: 2012-04-16 06:52:50
Message-ID: 6227.1334559170@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> Proposed solution: a modified merge join that can handle ranges.

> 1. Order the ranges on both sides by the lower bound, then upper bound.
> Empty ranges can be excluded entirely.
> 2. Left := first range on left, Right := first range on right
> 3. If Left or Right is empty, terminate.
> 4. If lower(Left) > upper(Right), discard Right, goto 2
> 5. If lower(Right) > upper(Left), discard Left, goto 2
> 6. return (Left, Right) as joined tuple
> 7. Right := next range on right
> 8. goto 3

This is surely not correct in detail. As written, it will be impossible
for any tuple on the right side to be joined to more than one left-side
tuple. You will need something analogous to the mark-and-restore
rewinding logic in standard mergejoin to make this work.

I can't escape the feeling that we could do this with more-or-less
standard mergejoin logic paying attention to some fuzzy notion of
equality, with the question of whether a given pair of ranges actually
overlap being deferred to a filter condition. Not quite sure how to
make that work though.

> * Is this idea sane? -- that is, are ranges important enough that
> people are willing to maintain a new operator?

Dunno. It might be easier to sell the idea of adding support for range
joins in a couple of years, after we've seen how much use ranges get.

I will note that mergejoin is not only one of the more complicated
executor modules, but it accounts for a huge percentage of the planner's
complexity; which doesn't exactly make me sympathize with the notion of
let's-copy-and-paste-all-that. It'd be a lot better if we could build
it as a small tweak to mergejoin rather than an independent thing.

(Having said that, I have no idea how the planner support could work
at all, because its treatment of mergejoins is intimately tied to
mergejoinable equality and EquivalenceClasses and PathKeys, which don't
seem readily extensible to the range situation.)

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nikhil Sontakke 2012-04-16 06:56:06 Re: how to create a non-inherited CHECK constraint in CREATE TABLE
Previous Message Heikki Linnakangas 2012-04-16 06:29:22 Re: 9.3 Pre-proposal: Range Merge Join