Re: Range Merge Join v1

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Range Merge Join v1
Date: 2017-09-17 17:24:34
Message-ID: CAMp0ubdY_6k11V_u4AHDPOySNbuTmM=kasNj90v5jyM3fa-nSA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 31, 2017 at 1:52 AM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> Updated patch attached. Changelog:
>
> * Rebased
> * Changed MJCompare to return an enum as suggested, but it has 4
> possible values rather than 3.
> * Added support for joining on contains or contained by (@> or <@) and
> updated tests.

The current patch does not work well with multiple keys, and I believe
it's important to solve because it will make it usable for
multi-dimension spatial joins.

The problem is this: the algorithm for a single key demands that the
input is sorted by (lower(r) NULLS FIRST, upper(r) NULLS LAST). That's
easy, because that's also the sort order for the range operator class,
so everything just works.

For multiple keys, the order of the input is:
lower(r1) NULLS FIRST, lower(r2) NULLS FIRST,
upper(r1) NULLS LAST, upper(r2) NULLS LAST

But that can't match up with any opclass, because an opclass can only
order one attribute at a time. In this case, the lower bound of r2 is
more significant than the upper bound of r1.

It's easy enough to adapt the execution to make this work as long as
the input is properly sorted. The challenge is making the optimizer
choose the sort keys properly.

I have tried a few approaches. The current approach that I'm using is:
* have a new, special range operator family with no operator classes
* in check_mergejoinable(), detect the && operator and set the
mergeopfamilies to contain only the special operator family
* in try_mergejoin_path(), it will convert the pathkeys using that
operator class into pathkeys over a "lower" expression over the same
EC; and then add on additional pathkeys for the "upper" expressions
onto the end of the pathkey list

Any comments or alternative suggestions welcome. This will probably
take a few days at least, so I put the patch in "waiting on author"
state.

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dmitriy Sarafannikov 2017-09-17 17:43:09 Improving DISTINCT with LooseScan node
Previous Message Chris Travers 2017-09-17 16:48:58 Re: Add Roman numeral conversion to to_number