Re: 9.3 Pre-proposal: Range Merge Join

From: Jeff Davis Robert Haas Tom Lane , pgsql-hackers(at)postgresql(dot)org Re: 9.3 Pre-proposal: Range Merge Join 2012-04-19 07:10:04 1334819404.5487.53.camel@jdavis (view raw or whole thread) 2012-04-16 05:40:50 from Jeff Davis  2012-04-16 06:18:24 from Darren Duncan   2012-04-16 20:50:36 from Jeff Davis  2012-04-16 06:29:22 from Heikki Linnakangas   2012-04-16 12:22:39 from Alexander Korotkov    2012-04-16 20:12:18 from Jeff Davis     2012-04-16 21:19:53 from Alexander Korotkov     2012-04-16 21:20:03 from Simon Riggs      2012-04-16 21:52:54 from Jay Levitt      2012-04-17 02:17:51 from Jeff Davis      2012-04-17 15:04:27 from Greg Stark  2012-04-16 06:52:50 from Tom Lane   2012-04-16 07:48:22 from Simon Riggs   2012-04-16 17:43:50 from Jeff Davis    2012-04-17 18:03:12 from Robert Haas     2012-04-18 05:21:40 from Tom Lane      2012-04-19 08:10:58 from Jeff Davis       2013-01-17 20:03:28 from Stefan Keller        2013-01-18 08:37:28 from Jeff Davis         2013-01-18 11:25:18 from Stefan Keller          2013-01-18 18:24:38 from Jeff Davis     2012-04-19 07:10:04 from Jeff Davis  2012-04-16 22:20:38 from Robert Haas   2012-04-16 23:05:15 from Tom Lane    2012-04-17 18:24:02 from Robert Haas     2012-04-19 06:25:45 from Jeff Davis pgsql-hackers
```On Tue, 2012-04-17 at 14:03 -0400, Robert Haas wrote:
> I'm actually not sure these are equivalent formulations.  Suppose one
> side has [i,i] where i ranges from 1 to 10000000 and the other side
> the exact same thing plus [1,10000000].  That one really big range
> will come up second on the right side, and you will not be able to
> discard it until you reach the end of the join.  If you just keep
> rewinding the right side, you're going to end up with O(n^2) behavior,
> whereas if you can discard tuples "from the middle" on the right side,
> then you will get O(n) behavior, which is a big difference.  In other
> words, in your original algorithm the tuples that you discard in step
> 4 or 5 need not be the first remaining tuple on whichever side of the

To illustrate the problem (slightly modified), I'll write the sets out
verbatim rather than use range syntax:

{1,2,3}              {1,2,3,4,5,6,7,8,9}
{  2,3,4}            {  2,3,4}. . . . .
{    3,4,5}          {    3,4,5}. . . .
{      4,5,6}        {      4,5,6}. . .
{        5,6,7}      {        5,6,7}. .
{          6,7,8}    {          6,7,8}.
{            7,8,9}  {            7,8,9}

The "." are supposed to represent a "shadow" that the large range [1,9]
casts below it. This shadow prevents the discarding of [2,4] on the RHS
even when processing range [5,7] on the LHS, because we can't discard
out of the middle.

Note that, if you just have some large ranges and a lot of overlap,
that's not really a problem with the algorithm, it's just a large result
to compute. The problem comes when the ranges vary in size by quite a
lot, and there are many ranges that could be eliminated but can't

This problem can be mitigated substantially, I believe. Let me change
the algorithm (I don't like the way the pseudocode was structured
anyway), and word it a little more loosely:

1. Look at the top ranges on each side. Choose the one with the greater
upper bound, and call that Range1 from Side1, and the other range R2
from Side2. If either Range1 or Range2 is empty, terminate.
2. Scan down Side2, discarding ranges that are strictly before, and
joining with ranges that overlap, stopping when you hit a range that is
strictly after.
range. Goto 1.

The benefit is, in step 1, we choose a side that will *always* discard
the top tuple. And if we choose the one with the greater upper bound,
then we are going to eliminate the largest shadow.

That doesn't eliminate the problem entirely, but it seems like it would
reduce it a lot.

Regarding the idea of discarding tuples in the middle, that might be an
interesting approach as well. It might be as simple as setting a flag in
the tuple header (like was done for full hash joins). Still not perfect,
but would make redundant checks very cheap. Combined with my strategy,
there's a good chance that we practically eliminate the problem.

Regards,
Jeff Davis

```

pgsql-hackers by date

 Next: From: Simon Riggs Date: 2012-04-19 07:12:16 Subject: Re: Re: SPGiST versus hot standby - question about conflict resolution rules Previous: From: Noah Misch Date: 2012-04-19 06:55:16 Subject: Re: SPGiST versus hot standby - question about conflict resolutionrules