Re: Range Merge Join v1

From: Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Range Merge Join v1
Date: 2017-04-20 09:05:36
Message-ID: CAOGQiiPrEZ49KqRZnxDsLXVNXxx4HhWUgYUgK6kioy=kt78=tw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 6, 2017 at 2:13 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:

> ========
> Example:
> ========
>
> Find different people using the same website at the same time:
>
> create table session(sessionid text, username text, during tstzrange);
> SELECT s1.username, s2.username, s1.during * s2.during
> FROM session s1, session s2
> WHERE s1.during && s2.during AND s1.username < s2.username
>
>
> =====================================
> Brief summary of previous discussion:
> =====================================
>
> http://www.postgresql.org/message-id/1334554850.10878.52.camel@jdavis
>
> - can indexes solve it already (parameterized paths, etc.)?
> - planner complexity (track path keys, mergejoinable equality, etc.)
> - spatial join algorithm choice
> - compare range merge join against existing indexes to see if any
> indexing approach is viable
>
> ==========
> Externals:
> ==========
>
> No new syntax or other externals. Just improved execution strategy for
> joining ranges using the overlaps operator (&&).
>
> New syntax is possible, but I don't see a strong reason to support
> special range join syntax at this time.
>
> =============
> Optimizer Design
> =============
>
> I took the path of least resistance, so if I am doing something wrong
> please let me know. I hardwired it to look for the range overlaps
> operator when checking if it could do a merge join, and then add
> range_ops to restrictinfo->mergeopfamilies.
>
> Then, I have to suppress adding it as an equivalence class, because
> overlaps is not equality. It still adds single-member ECs to
> restrictinfo->right_ec and left_ec, but (I think) that's OK.
>
> Also, I have to prevent generating paths for right/full join, because
> the range join algorithm can't (currently) handle those.
>
> =============
> Costing
> =============
>
> Needs more consideration. Seems to do reasonable things in the few
> examples I tried.
>
> =============
> Executor Design
> =============
>
> See detailed comments in nodeMergejoin.c
>
> =============
> Performance
> =============
>
> Seems much better than index nest loop join when the join is not
> selective. I will post detailed numbers soon.
>
> If no index is available, range merge join is the only reasonable way
> to execute a range join. For instance, if the inner is not a leaf in
> the plan.
>
> =============
> Alternatives:
> =============
>
> It was suggested that I approach it as a general spatial-join
> problem. That has merit, but I rejected it for now because the
> algorithm that I think has the most promise is range-oriented
> anyway. By that I mean we would need to extract all of the dimensions
> into ranges first, and then perform the join. With that in mind, it
> makes sense to implement range joins first; and then later provide the
> APIs to get at the spatial dimensions so that we can implement other
> spatial joins as range joins.
>
> Another suggestion was to base it on hash join, but I never understood
> that proposal and it didn't seem to map very well to a spatial join.
>
> Yet another suggestion was to use some kind of temporary index. Some
> brief experiments I did indicated that it would be fairly slow (though
> more investigation might be useful here). Also, it doesn't provide any
> alternative to the nestloop-with-inner-index we already offer at the
> leaf level today.
>
> Regards,
> Jeff Davis
>

Looks like an interesting idea, however, in an attempt to test this patch I
found following error when compiling,
selfuncs.c: In function ‘mergejoinscansel’:
selfuncs.c:2901:12: error: ‘op_strategy’ undeclared (first use in this
function)
&op_strategy,

--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2017-04-20 10:05:14 Re: Partition-wise join for join between (declaratively) partitioned tables
Previous Message Paresh More 2017-04-20 08:32:04 Patch - Tcl 8.6 version support for PostgreSQL