Re: [PATCH] we have added support for box type in SP-GiST index

From: Oleg Bartunov <obartunov(at)gmail(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Lebedev <a(dot)lebedev(at)postgrespro(dot)ru>
Subject: Re: [PATCH] we have added support for box type in SP-GiST index
Date: 2016-02-29 20:17:39
Message-ID: CAF4Au4zxd2XOV0A__FU7xoHxSiwJzm1z2xhs-FFaT1DzB9ub3Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 15, 2016 at 6:29 PM, Teodor Sigaev <teodor(at)sigaev(dot)ru> wrote:

> It's very pity but author is not able to continue work on this patch, and
> I would like to raise this flag.
>
> I'd like to add some comments about patches:
>
> traversalValue patch adds arbitrary value assoсiated with branch in
> SP-GiST tree walk. Unlike to recostructedValue it could be just pointer, it
> havn't to be a regular pgsql type. Also, it could be used independly from
> reconstructedValue. This patch is used both following attached patches.
>
> range patch just switchs using reconstructedValue to traversalValue in
> range opclasses. reconstructedValue was used just because it was an
> acceptable workaround in case of range type. Range opclase stores a full
> value in leafs and doesn't need to use reconstructedValue to return tuple
> in index only scan.
> See http://www.postgresql.org/message-id/5399.1343512250@sss.pgh.pa.us
>
> q4d patch implements index over boxes using SP-GiST. Basic idea was an
> observation, range opclass thinks about one-dimensional ranges as 2D points.
> Following this idea, we can think that 2D box (what is 2 points or 4
> numbers) could represent a 4D point. We hoped that this approach will be
> much more effective than traditional R-Tree in case of many overlaps in
> box's collection.
> Performance results are coming shortly.
>

I made some benchmarks using two real-life datasets. Streets data set is
consists of 3691620 MBRs of russian streets with not much overlaps. As
expected, spgist is slower than rtree (gist), because of much bigger
number of blocks readed. Build time, however, is faster for spgist than
rtree (~ 1.3 times).

Bounds data set is consists of 7803499 MBRs of some objects and are highly
overlapped, see attached picture (crop with center in Moscow) of boxes.
Create index time (ms):
spgist: 47036.051
rtree: 68491.242

Size:
spgist: 505 MB
rtree: 663 MB
heap: 871 MB

Let's consider a small box in downtown of Moscow: box
'(37.607164,55.756489), (37.607797,55.756725)'.
&& оператор (returns 23958 boxes) :
spgist: 14.649
rtree: 23.715
seqscan: 924.344

@> operator (returns 23868 boxes):

spgist: 14.113
rtree: 26.853
seqscan: 909.147

<@ operator (returns 0 boxes):
spgist: 0..268 ms
rtree: 12.563

My conclusion is that for "spagetti" data new opclass for spgist is good to
have - it's smaller and faster (both, created index and search) that rtree
(gist based).

>
> In near future (say, tommorrow) I hope to suggest a opclass over polygons
> based on this approach.
>
>
Yes, it'd be interested.

>
> Alexander Lebedev wrote:
>
>> Hello, Hacker.
>>
>> * [PATCH] add a box index to sp-gist
>>
>> We have extended sp-gist with an index that keeps track of boxes
>>
>> We have used ideas underlying sp-gist range implementation to
>> represent 2D boxes as points in 4D space. We use quad tree
>> analogue, but in 4-dimensional space. We call this tree q4d. Each
>> node of this tree is a box (a point in 4D space) which splits space
>> in 16 hyperrectangles.
>>
>> Rationale: r-tree assumes that boxes we're trying to index don't
>> overlap much. When this assumption fails, r-tree performs badly,
>> while our proposal to represent a rectangle as a point in 4D space
>> solves this problem.
>>
>> NB: the index uses traversalValue introduced in a separate patch.
>>
>> * [PATCH] add traversalValue in sp-gist
>>
>> During implementation of box index for sp-gist we saw that we only
>> keep rectangles, but to determine traversal direction we may need
>> to know boundaries of a hyperrectangle. So we calculate them
>> on the fly and store them in traversalValue, available
>> when traversing child nodes, because otherwise we would't be able to
>> calculate them from inside the inner_consistent function call.
>>
>> This patch was written by Teodor Sigaev.
>>
>> * [PATCH] change reconstructValue -> traversalValue in range_spgist.c
>>
>> During traversal, local context is
>> insufficient to pick traversal direction. As of now, this is worked
>> around with the help of reconstructValue. reconstructValue only
>> stores data of the same type as a tree node, that is, a range.
>>
>> We have written a patch that calculates auxillary values and makes
>> them accessible during traversal.
>>
>> We then use traversalValue in a new box index and change
>> range_spgist.c to use it in place of reconstructValue.
>>
>> NB: apply this patch after traversalValue patch.
>>
>>
>>
>>
>>
> --
> Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
> WWW:
> http://www.sigaev.ru/
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

Attachment Content-Type Size
11148387_10207592076787467_3791931596034486026_o.jpg image/jpeg 488.8 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Petr Jelinek 2016-02-29 21:10:05 Re: Proposal: Generic WAL logical messages
Previous Message Alvaro Herrera 2016-02-29 19:42:09 Re: [COMMITTERS] pgsql: Add isolationtester spec for old heapam.c bug