Re: Patch: Range Merge Join

From: Thomas <thomasmannhart97(at)gmail(dot)com>
To: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, pgsql(at)j-davis(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: dignoes(at)inf(dot)unibz(dot)it, boehlen(at)ifi(dot)uzh(dot)ch, p(dot)moser(at)noi(dot)bz(dot)it, gamper(at)inf(dot)unibz(dot)it, Thomas Mannhart <thomas_m(at)hotmail(dot)ch>
Subject: Re: Patch: Range Merge Join
Date: 2021-11-10 14:03:55
Message-ID: CAMWfgiDDvHqN2STsCyKj9oyBJ1yCQdDOp_6Jc0xupuJ8fn9xtQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear all,
thanks for the feedback!

We had a closer look at the previous patches and the CustomScan
infrastructure.

Compared to the previous patch, we do not (directly) focus on joins
with the overlap (&&) condition in this patch. Instead we consider
joins with containment (@>) between a range and an element, and joins
with conditions over scalars of the form "right.element BETWEEN
left.start AND left.end", and more generally left.start >(=)
right.element AND right.element <(=) left.end. We call such conditions
range conditions and these conditions can be combined with equality
conditions in the Range Merge Join.

The Range Merge Join can use (optional) equality conditions and one
range condition of the form shown above. In this case the inputs are
sorted first by the attributes used for equality and then one input by
the range (or start in the case of scalars) and the other input by the
element. The Range Merge Join is then a simple extension of the Merge
Join that in addition to the (optional) equality attributes also uses
the range condition in the merge join states. This is similar to an
index-nested loop with scalars for cases when the relation containing
the element has an index on the equality attributes followed by the
element. The Range Merge Join uses sorting and thus does not require
the index for this purpose and performs better.

The patch uses the optimizer estimates to evaluate if the Range Merge
Join is beneficial as compared to other execution strategies, but when
no equality attributes are present, it becomes the only efficient
option for the above range conditions. If a join contains multiple
range conditions, then based on the estimates the most effective
strategy is chosen for the Range Merge Join.

Although we do not directly focus on joins with the overlap (&&)
condition between two ranges, we show in [1] that these joins can be
evaluated using the union (UNION ALL) of two joins with a range
condition, where intuitively, one tests that the start of one input
falls within the range of the other and vice versa. We evaluated this
using regular (B-tree) indices and compare it to joins with the
overlap (&&) condition using GiST, SP-GiST and others, and found that
it performs better. The Range Merge Join would improve this further
and would not require the creation of an index.

We did not consider an implementation as a CustomScan, as we feel the
join is rather general, can be implemented using a small extension of
the existing Merge Join, and would require a substantial duplication
of the Merge Join code.

Kind regards,
Thomas, Anton, Johann, Michael, Peter

[1] https://doi.org/10.1007/s00778-021-00692-3 (open access)

Am Di., 5. Okt. 2021 um 02:30 Uhr schrieb Jaime Casanova <
jcasanov(at)systemguards(dot)com(dot)ec>:

> > On Mon, Oct 04, 2021 at 04:27:54PM -0500, Jaime Casanova wrote:
> >> On Thu, Jun 10, 2021 at 07:14:32PM -0700, Jeff Davis wrote:
> >> >
> >> > I'll start with the reason I set the work down before: it did not work
> >> > well with multiple join keys. That might be fine, but I also started
> >> > thinking it was specialized enough that I wanted to look into doing it
> >> > as an extension using the CustomScan mechanism.
> >> >
> >> > Do you have any solution to working better with multiple join keys?
> And
> >> > do you have thoughts on whether it would be a good candidate for the
> >> > CustomScan extension mechanism, which would make it easier to
> >> > experiment with?
> >> >
> >>
> >> Hi,
> >>
> >> It seems this has been stalled since jun-2021. I intend mark this as
> >> RwF unless someone speaks in the next hour or so.
> >>
>
> Thomas <thomasmannhart97(at)gmail(dot)com> wrote me:
>
> > Hi,
> >
> > I registered this patch for the commitfest in july. It had not been
> reviewed and moved to the next CF. I still like to submit it.
> >
> > Regards,
> > Thomas
> >
>
> Just for clarification RwF doesn't imply reject of the patch.
> Nevertheless, given that there has been no real review I will mark this
> patch as "Waiting on Author" and move it to the next CF.
>
> Meanwhile, cfbot (aka http://commitfest.cputube.org) says this doesn't
> compile. Here is a little patch to fix the compilation errors, after
> that it passes all tests in make check-world.
>
> Also attached a rebased version of your patch with the fixes so we turn
> cfbot entry green again
>
> --
> Jaime Casanova
> Director de Servicios Profesionales
> SystemGuards - Consultores de PostgreSQL
>

Attachment Content-Type Size
postgres.patch application/octet-stream 47.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2021-11-10 14:31:40 Re: Removed unused import modules from tap tests
Previous Message Daniel Gustafsson 2021-11-10 13:56:00 Re: fix warnings in 9.6, 10, 11's contrib when compiling without openssl