Re: 9.3 Pre-proposal: Range Merge Join

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 9.3 Pre-proposal: Range Merge Join
Date: 2012-04-19 07:10:04
Message-ID: 1334819404.5487.53.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2012-04-17 at 14:03 -0400, Robert Haas wrote:
> I'm actually not sure these are equivalent formulations. Suppose one
> side has [i,i] where i ranges from 1 to 10000000 and the other side
> the exact same thing plus [1,10000000]. That one really big range
> will come up second on the right side, and you will not be able to
> discard it until you reach the end of the join. If you just keep
> rewinding the right side, you're going to end up with O(n^2) behavior,
> whereas if you can discard tuples "from the middle" on the right side,
> then you will get O(n) behavior, which is a big difference. In other
> words, in your original algorithm the tuples that you discard in step
> 4 or 5 need not be the first remaining tuple on whichever side of the
> join we're talking about.

To illustrate the problem (slightly modified), I'll write the sets out
verbatim rather than use range syntax:

{1,2,3} {1,2,3,4,5,6,7,8,9}
{ 2,3,4} { 2,3,4}. . . . .
{ 3,4,5} { 3,4,5}. . . .
{ 4,5,6} { 4,5,6}. . .
{ 5,6,7} { 5,6,7}. .
{ 6,7,8} { 6,7,8}.
{ 7,8,9} { 7,8,9}

The "." are supposed to represent a "shadow" that the large range [1,9]
casts below it. This shadow prevents the discarding of [2,4] on the RHS
even when processing range [5,7] on the LHS, because we can't discard
out of the middle.

Note that, if you just have some large ranges and a lot of overlap,
that's not really a problem with the algorithm, it's just a large result
to compute. The problem comes when the ranges vary in size by quite a
lot, and there are many ranges that could be eliminated but can't
because of the shadow.

This problem can be mitigated substantially, I believe. Let me change
the algorithm (I don't like the way the pseudocode was structured
anyway), and word it a little more loosely:

1. Look at the top ranges on each side. Choose the one with the greater
upper bound, and call that Range1 from Side1, and the other range R2
from Side2. If either Range1 or Range2 is empty, terminate.
2. Scan down Side2, discarding ranges that are strictly before, and
joining with ranges that overlap, stopping when you hit a range that is
strictly after.
3. Now, discard Range1, and reset Side2 to the first non-discarded
range. Goto 1.

The benefit is, in step 1, we choose a side that will *always* discard
the top tuple. And if we choose the one with the greater upper bound,
then we are going to eliminate the largest shadow.

That doesn't eliminate the problem entirely, but it seems like it would
reduce it a lot.

Regarding the idea of discarding tuples in the middle, that might be an
interesting approach as well. It might be as simple as setting a flag in
the tuple header (like was done for full hash joins). Still not perfect,
but would make redundant checks very cheap. Combined with my strategy,
there's a good chance that we practically eliminate the problem.

Regards,
Jeff Davis

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2012-04-19 07:12:16 Re: Re: SPGiST versus hot standby - question about conflict resolution rules
Previous Message Noah Misch 2012-04-19 06:55:16 Re: SPGiST versus hot standby - question about conflict resolution rules