Re: [PATCH] we have added support for box type in SP-GiST index

From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: emre(at)hasegeli(dot)com
Cc: Oleg Bartunov <obartunov(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Lebedev <a(dot)lebedev(at)postgrespro(dot)ru>
Subject: Re: [PATCH] we have added support for box type in SP-GiST index
Date: 2016-03-21 16:57:10
Message-ID: 56F027E6.2070303@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>> + * implementation of quad-4d tree over boxes for SP-GiST.
> Isn't the whole thing actually 3D?
No. The idea if this work is a representation of 2d box as 4d point. Quad means
quadrant of 2d plane. Originally such kind of tree was developed for 2d point,
and each node of tree splits plane on 4 quadrant. For 3d tree each node splits
space for 8 "quadrants", 4d - 16 ones.

>> + * For example consider the case of intersection.
> No need for a new line after this.
fixed

>> + * A quadrant has bounds, but sp-gist keeps only 4-d point (box) in inner nodes.
> I couldn't get the term 4D point. Maybe it means that we are using
> box datatype as the prefix, but we are not treating them as boxes.
exactly, we treat boxes as 4-dimentional points.

>
>> + * We use traversalValue to calculate quadrant bounds from parent's quadrant
>> + * bounds.
> I couldn't understand this sentence. We are using the parent's prefix
> and the quadrant to generate the traversalValue. I think this is the
> crucial part of the opclass. It would be nice to explain it more
> clearly.

I'll try to explain with two-dimensional example over points. ASCII-art:
|
|
1 | 2
|
-----------+-------------
|P
3 | 4
|
|

'+' with 'A' represents a centroid or, other words, point which splits plane for
4 quadrants and it strorend in parent node. 1,2,3,4 are labels of quadrants,
each labeling will be the same for all pictures and all centriods, and i will
not show them in pictures later to prevent too complicated images. Let we add C
- child node (and again, it will split plane for 4 quads):

| |
----+---------+---
X |B |C
| |
-----------+---------+---
|P |A
| |
|
|

A and B are points of intersection of lines. So, box PBCAis a bounding box for
points contained in 3-rd (see labeling above). For example X labeled point is
not a descendace of child node with centroid C because it must be in branch of
1-st quad of parent node. So, each node (except root) will have a limitation in
its quadrant. To transfer that limitation the traversalValue is used.

To keep formatting I attached a txt file with this fragment.

>
>> + * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
> 2016.
fixed

>
>> + * src/backend/utils/adt/boxtype_spgist.c
> Maybe we should name this file as geo_spgist.c to support other types
> in the future.
done

>> + #include "utils/builtins.h";
> Extra ;.
fixed

>
>> + #include "utils/datum.h"
> I think this is unnecessary.
removed

>
>> + /* InfR type implements doubles and +- infinity */
>> + typedef struct
>> + {
>> + int infFlag;
>> + double val;
>> + } InfR;
> Why do we need this? Can't we store infinity in "double"?
Hmm, you are right. fixed.

> This isn't returning the value.
removed

>
>> + typedef struct
>> + {
>> + Range range_x;
>> + Range range_y;
>> + } Rectangle;
>
> Wouldn't it be simpler to just using BOX instead of this?
Too much the same names in RectBox

>
>> + static void
>> + evalIRangeBox(const IRangeBox *range_box, const Range *range, const int half1,
>> + const int half2, IRangeBox *new_range_box)
>>
>> + static void
>> + evalIRectBox(const IRectBox *rect_box, const Rectangle *centroid,
>> + const uint8 quadrant, IRectBox * new_rect_box)
>>
>> + inline static void
>> + initializeUnboundedBox(IRectBox * rect_box)
>
> Wouldn't it be simpler to palloc and return the value on those functions?
evalRangeBox() initializes part of RectBox, so, it could not palloc its result.
Other methods use the same signature just for consistency.

>
>> + static int
>> + intersect2D(const Range * range, const IRangeBox * range_box)
>
> Wouldn't it be better to return "bool" on those functions.
done

>
>> + return ((p1 >= 0) && (p2 <= 0));
> Extra parentheses.
removed

>
>> + i const spgChooseIn *in = (spgChooseIn *) PG_GETARG_POINTER(0);
>> + i spgChooseOut *out = (spgChooseOut *) PG_GETARG_POINTER(1);
>
> Many variables are defined with "const". I am not sure they are all
> that doesn't change. If it applies to the pointer, "out" should also
> not change in here. Am I wrong?
No, changes

>
>> + /*
>> + * Begin. This block evaluates the median of coordinates of boxes
>> + */
> I would rather explain what the function does on the function header.
fixed

>
>> + memcpy(new_rect_box, rect_box, sizeof(IRectBox));
> Do we really need to copy the traversalValues on allTheSame case.
> Wouldn't it work if just the same value is passed for all of them.
> The search shouldn't continue after allTheSame case.
Seems, yes. spgist tree could contain a locng branches with allTheSame.

>
>> + for (i = 0; flag && i < in->nkeys && flag; i++)
> It checks flag two times.
fixed

>
>> + boxPointerToRectangle(
>> + DatumGetBoxP(in->scankeys[i].sk_argument),
>> + p_query_rect
>> + );
> I don't think this matches the project coding style.
fixed

>
>> + int flag = 1,
> Wouldn't it be better as "bool"?
fixed.

> The regression tests for the remaining operators are still missing.
Ooops, will be soon.

Attached all patches to simplify review. Thank you very much!

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

Attachment Content-Type Size
art.txt text/plain 1.1 KB
traversalValue-2.patch.gz application/x-gzip 1.9 KB
q4d-3.patch.gz application/x-gzip 6.9 KB
range-1.patch.gz application/x-gzip 1.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Petr Jelinek 2016-03-21 17:10:55 Re: Proposal: Generic WAL logical messages
Previous Message Anastasia Lubennikova 2016-03-21 16:53:49 Re: WIP: Covering + unique indexes.