Re: Range Merge Join v1

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Range Merge Join v1
Date: 2017-04-11 07:17:29
Message-ID: CAMp0ubfKjj6H39iAX=zoPZ0ke_MAyTatQPwMrfw5dkSSztuV4Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Version 2 attached. Fixed a few issues, expanded tests, added docs.

A simple performance test (script attached) shows about a 5X
improvement when comparing against a nested loop with an inner
index-only scan over a gist index.

Even better, this doesn't require an index, so it will work even if
the input relations are subqueries.

Regards,
Jeff Davis

On Thu, Apr 6, 2017 at 1:43 AM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> ========
> Example:
> ========
>
> Find different people using the same website at the same time:
>
> create table session(sessionid text, username text, during tstzrange);
> SELECT s1.username, s2.username, s1.during * s2.during
> FROM session s1, session s2
> WHERE s1.during && s2.during AND s1.username < s2.username
>
>
> =====================================
> Brief summary of previous discussion:
> =====================================
>
> http://www.postgresql.org/message-id/1334554850.10878.52.camel@jdavis
>
> - can indexes solve it already (parameterized paths, etc.)?
> - planner complexity (track path keys, mergejoinable equality, etc.)
> - spatial join algorithm choice
> - compare range merge join against existing indexes to see if any
> indexing approach is viable
>
> ==========
> Externals:
> ==========
>
> No new syntax or other externals. Just improved execution strategy for
> joining ranges using the overlaps operator (&&).
>
> New syntax is possible, but I don't see a strong reason to support
> special range join syntax at this time.
>
> =============
> Optimizer Design
> =============
>
> I took the path of least resistance, so if I am doing something wrong
> please let me know. I hardwired it to look for the range overlaps
> operator when checking if it could do a merge join, and then add
> range_ops to restrictinfo->mergeopfamilies.
>
> Then, I have to suppress adding it as an equivalence class, because
> overlaps is not equality. It still adds single-member ECs to
> restrictinfo->right_ec and left_ec, but (I think) that's OK.
>
> Also, I have to prevent generating paths for right/full join, because
> the range join algorithm can't (currently) handle those.
>
> =============
> Costing
> =============
>
> Needs more consideration. Seems to do reasonable things in the few
> examples I tried.
>
> =============
> Executor Design
> =============
>
> See detailed comments in nodeMergejoin.c
>
> =============
> Performance
> =============
>
> Seems much better than index nest loop join when the join is not
> selective. I will post detailed numbers soon.
>
> If no index is available, range merge join is the only reasonable way
> to execute a range join. For instance, if the inner is not a leaf in
> the plan.
>
> =============
> Alternatives:
> =============
>
> It was suggested that I approach it as a general spatial-join
> problem. That has merit, but I rejected it for now because the
> algorithm that I think has the most promise is range-oriented
> anyway. By that I mean we would need to extract all of the dimensions
> into ranges first, and then perform the join. With that in mind, it
> makes sense to implement range joins first; and then later provide the
> APIs to get at the spatial dimensions so that we can implement other
> spatial joins as range joins.
>
> Another suggestion was to base it on hash join, but I never understood
> that proposal and it didn't seem to map very well to a spatial join.
>
> Yet another suggestion was to use some kind of temporary index. Some
> brief experiments I did indicated that it would be fairly slow (though
> more investigation might be useful here). Also, it doesn't provide any
> alternative to the nestloop-with-inner-index we already offer at the
> leaf level today.
>
> Regards,
> Jeff Davis

Attachment Content-Type Size
perf.sql application/sql 667 bytes
rangejoin-v2.patch text/x-patch 44.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2017-04-11 07:32:29 Re: Variable substitution in psql backtick expansion
Previous Message Kuntal Ghosh 2017-04-11 07:11:47 Re: strange parallel query behavior after OOM crashes