Re: Range Merge Join v1

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Range Merge Join v1
Date: 2017-08-31 08:52:05
Message-ID: CAMp0ube+NEJsJimo32N118F=qcq1GkHEwMpHLT9vEZC+dmGZZA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Aug 25, 2017 at 10:19 AM, Alexander Kuzmenkov
<a(dot)kuzmenkov(at)postgrespro(dot)ru> wrote:
> Hi Jeff,

Hi,

Thank you for the review and suggestions!

> * At the moment, "mergejoinable clause" and "equality clause" mean the same
> thing to the planner, and those clauses are used both to create equivalence
> classes and to perform merge joins. If we start performing merge joins for
> different kinds of clauses, such as comparison or range intersection, it
> makes sense to separate these two meanings. I tried this in my patch and it
> didn't require many changes. I use RestrictInfo.equivopfamilies to build
> equivalence classes, and RestrictInfo.mergeopfamilies for mergejoinable
> clauses.

I like the idea. I will look in more detail and I can either change my
patch or piggyback on yours.

> * Semantically, MJCompare() is a function that determines whether you should
> emit a join result or advance one of the pointers. This meaning is not
> explicit in the code and is not well encapsulated: the function returns and
> int and 'compareNoMatch' flag, and the calling code combines them in various
> ways to derive the final result. This meaning can be made explicit by making
> MJCompare return enum values {Join, NextInner, NextOuter}, and putting
> inside it all the logic that decides whether we join or advance.
> ExecMergeJoin looks intimidating already, and I think this change would help
> readability. Again, you can find an illustration in my patch.

I actually tried doing something similar in my patch, but there are
four cases to consider in EXEC_MJ_SKIP_TEST:

1. Overlaps: mark and then join the tuples
2. left-of: SKIPOUTER_ADVANCE
3. right-of: SKIPINNER_ADVANCE
4. None of the above: mark and NEXTINNER

However, you are right that the current code is ugly, and I should
refactor into 4 explicit cases. I don't think I will call them by the
same names as the join states, though, because there's not a direct
mapping.

Updated patch attached. Changelog:

* Rebased
* Changed MJCompare to return an enum as suggested, but it has 4
possible values rather than 3.
* Added support for joining on contains or contained by (@> or <@) and
updated tests.

Regards,
Jeff Davis

Attachment Content-Type Size
rangejoin-v3.diff text/plain 59.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Guo 2017-08-31 10:02:56 CurrentUserId may be invalid during the rest of a session
Previous Message Amit Khandekar 2017-08-31 08:45:37 Re: UPDATE of partition key