Merge join for GiST

From: Andrew Borodin <borodin(at)octonica(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Sergey Mirvoda <sergey(at)mirvoda(dot)com>
Subject: Merge join for GiST
Date: 2017-04-10 11:53:17
Message-ID: CAJEAwVE5z5_mE-0SUBjNjNY=CPYf8vq0N9pGHiJckWqE5-8H3A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, hackers!

==Spatial joins==
Scientific papers from the dawn of R-trees and multidimensional
indexes feature a lot of algorithms for spatial joins.
I.e. you have two sets of geometries s1 and s2, you need to produce
all colliding pairs (p1,p2) where p1 in s1 and p2 in s2. For 2 R-trees
of equal heights with roots R and S most straightforward[1] algorithm
look like:

SpatialJoin (R,S: Node)
{
FOR (all E in S)
FOR (all ER in R with ER.rect intersecting E.rect)
IF (R is a leaf page)
{
Output ER,ES
}
ELSE
{
SpatialJoin (ER.ref, ES.ref)
}
}

==Motivation==
I’m mostly interested in OLAP-style aggregates like:

SELECT rows.Id, sum(data.attribute1), sum(data.attribute2), sum(data.attribute3)
FROM rows LEFT JOIN data ON rows.IndexedAttribute && data.IndexedAttribute
GROUP BY 1

When we have two GiST for rows.IndexedAttribute and data.IndexedAttribute.
Currently, this query produces plans with Nested Loop Join, so for
every row from “rows” there is one GiST scan into “data”.

==Proposal==
Obviously, for this scenario, we can use GiST-based join algorithm
identical to the SpatialJoin algorithm above.

This algorithm will work iif (key1 && key2) ==> (union(key1,key3) &&
key2 ) and (union(key2,key3) && key1 ) for any key3. This is quite
straightforward for && operator, while I’m not sure this will work for
<@ and @> and others.

I can try to implement this kind of join as an extension or try to
embed this into the planner.

I think this idea is somewhat related to this patch [2], but as for
now cannot describe how exactly GiST merge and Range Merge features
relate.

How do you think:
1. Has this idea been considered before? I had not found anything on
actual spatial join of GiSTS.
2. What is the easiest way to implement this feature?
3. What kind of operators may be spatial-joined without intervention
to existing GiST scan strategies API?

Many thanks for reading.

Best regards, Andrey Borodin.

[1] Brinkhoff, Thomas, Hans-Peter Kriegel, and Bernhard Seeger.
Efficient processing of spatial joins using R-trees. Vol. 22. No. 2.
ACM, 1993.
[2] https://www.postgresql.org/message-id/flat/CAMp0ubfwAFFW3O_NgKqpRPmm56M4weTEXjprb2gP_NrDaEC4Eg%40mail(dot)gmail(dot)com#CAMp0ubfwAFFW3O_NgKqpRPmm56M4weTEXjprb2gP_NrDaEC4Eg(at)mail(dot)gmail(dot)com

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-04-10 12:05:23 Re: Compiler warning in costsize.c
Previous Message Kuntal Ghosh 2017-04-10 11:39:50 Re: strange parallel query behavior after OOM crashes