Skip site navigation (1) Skip section navigation (2)

Re: SP-GiST for ranges based on 2d-mapping and quad-tree

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SP-GiST for ranges based on 2d-mapping and quad-tree
Date: 2012-07-28 21:33:16
Message-ID: 50145A9C.7080400@enterprisedb.com (view raw or flat)
Thread:
Lists: pgsql-hackers
On 23.07.2012 10:37, Alexander Korotkov wrote:
> On Fri, Jul 20, 2012 at 3:48 PM, Heikki Linnakangas<
> heikki(dot)linnakangas(at)enterprisedb(dot)com>  wrote:
>
>> It would be nice to have an introduction, perhaps as a file comment at the
>> top of rangetypes_spgist.c, explaining how the quad tree works. I have a
>> general idea of what a quad tree is, but it's not immediately obvious how
>> it maps to SP-GiST. What is stored on a leaf node and an internal node?
>> What is the 'prefix' (seems to be the centroid)? How are ranges mapped to
>> 2D points? (the function comment of getQuadrant() is a good start for that
>> last one)
>
> I've added some comments at the top of rangetypes_spgist.c.

Thanks, much better.

I think the handling of empty ranges needs some further explanation. If 
I understand the code correctly, the root node can contain a centroid 
like usual, and empty ranges are placed in the magic 5th quadrant. 
Alternatively, the root node has no centroid, and contains only two 
subnodes: all empty ranges are placed under one subnode, and non-empty 
ranges under the other.

It seems it would be simpler if we always stored empty nodes the latter 
way, with a no-centroid root node, and nodes with a centroid would 
always only have 4 subnodes. When the first empty range is added to an 
index that already contains non-empty values, the choose-function could 
return spgSplitTuple to create a new root node that divides the space 
into empty and non-empty values. Then again, I don't think it would 
actually simplify the code much. Handling the 5th quadrant doesn't 
require much code in spg_range_quad_inner_consistent() as it is. So 
maybe it's better to leave it the way it is.

Or perhaps we should stipulate that a root node with no centroid can 
only contain empty ranges. When you add the first non-empty range, have 
choose function return spgSplitTuple, and create a new root node with a 
centroid, and the empty ranges in the 5th quadrant.

>> In spg_range_quad_inner_**consistent(), if in->hasPrefix == true, ISTM that
>> in all cases where 'empty' is true, 'which' is set to 0, meaning that there
>> can be no matches in any of the quadrants. In most of the case-branches,
>> you explicitly check for 'empty', but even in the ones where you don't, I
>> think you end up setting which=0 if empty==true. I'm not 100% sure about
>> the RANGESTRAT_ADJACENT case, though. Am I missing something?
>
> Ops., it was a bug: RANGESTRAT_ADJACENT shoud set which=0 if empty==true,
> while RANGESTRAT_CONTAINS and RANGESTRAT_CONTAINED_BY not. Corrected.

Ok. I did some copy-editing, rewording some comments, and fixing 
whitespace. Patch attached, hope I didn't break anything.

I think the most difficult part of the patch is the 
spg_range_quad_inner_consistent() function. It's hard to understand how 
the various strategies are implemented. I started to expand the comments 
in for each strategy, explaining how each strategy maps to a bounding 
box in the 2D space, but I'm not sure how much that actually helps. 
Perhaps it would help to also restructure the code so that you first 
have a switch-case statement that maps the scan key to bounding box, 
without worrying about the centroid yet, and then check which quadrants 
you need to descend to to find points within the box. The 
adjacent-strategy is more complicated than a simple bounding-box search, 
though.

I'm having trouble understanding how exactly the RANGESTRAT_ADJACENT 
works. The geometric interpretation of 'adjacent' is that the scan key 
defines two lines, one horizontal and one vertical. Any points that lie 
on either of the lines match the query. Looking at the implementation, 
it's not clear how the code implements the search for along those two lines.

Also, I wonder if we really need to reconstruct the "previous" value in 
a RANGESTRAT_ADJACENT search. ISTM we only need to remember which of the 
two lines we are chasing. For example, if you descend to quadrant 2 
because there might be a point there that lies on the horizontal line, 
but we already know that there can't be any points there lie on the 
vertical line, you only need to remember that, not the whole centroid 
from the previous level. Does the SP-GiST API require the 
"reconstructed" values stored by inner_consistent to be of the correct 
datatype, or can it store any Datums in the array?

-- 
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Attachment: range_spgist_quad-0.6-heikki.patch
Description: text/x-diff (48.3 KB)

In response to

Responses

pgsql-hackers by date

Next:From: Jeff JanesDate: 2012-07-28 21:46:56
Subject: Re: Covering Indexes
Previous:From: Jeff JanesDate: 2012-07-28 20:09:39
Subject: effective_io_concurrency

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group