Re: WIP: 2d-mapping based indexing for ranges

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: WIP: 2d-mapping based indexing for ranges
Date: 2012-05-31 04:57:15
Message-ID: 1338440235.14941.33.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 2012-05-28 at 23:42 +0400, Alexander Korotkov wrote:
> Hackers,
>
>
> attached patch implements another approach to indexing of ranges:
> mapping lower and upper bounds as 2d-coordinates and using same
> indexing approaches as for 2d points. Patch provides range_ops2
> operator class which implements this approach.

...

> "Contained by" queries with small selectivity are dramatically faster
> as expected.

That's good news, but that seems like a less-common query. Overlaps and
contains are probably the two most important. It would be good to show a
significant benefit in at least one of those two queries if the build
time increases.

> 2) Limitations of "penalty" expressive power. GiST penalty function
> returns just one float value. It is a limitation even for geometrical
> datatypes. Some papers on R-trees recommends insert new entry into box
> which have minimal area when this boxes have same extension area (very
> possible if extension area is zero). We can't express it using single
> float value.
> For 2d-mapping ranges indexing situation is even harder. Imagine,
> some set of ranges could have same lower or upper bound. On 2d space
> such ranges will be united into lines. But, we can't express
> increasing of line length in penalty, because area of line is always
> zero independently on it's length. Actually, similar problem exists
> for geometrical datatypes, but it seems to me this problem is more
> urgent for ranges, because I believe in real life ranges, large set of
> ranges could really have same lower or upper bound.
> Probably, we would like to rank key extension cases as following or
> similar:
> a) extension of finite dimension to infinite
> b) extension in dimension perpendicular to infinite
> c) extension with finite area extension
> d) extension with zero area increment (extension of line length)
> So, it's impossible to fil all of these into one float. Ans it's hard
> to choose what to neglect and how.
> I think there are at least two ways to make GiSTinferface sensitive
> enough to these things:
> a) Make penalty function return more than one float. The question is
> how many floats we allow?
> b) Create alternative "choose" interface function which would return
> set of most preferred keys for insert. Usage of such function could
> lead to slowdown because currently GiST stop scan when met zero
> penalty.

I agree that we should consider improving the interface for the penalty
function. In your last patch improving range indexing, it was quite
awkward to express the penalty in one dimension.

Regards,
Jeff Davis
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Erik Rijkers 2012-05-31 05:06:48 Re: FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", Line: 1741
Previous Message Tom Lane 2012-05-31 04:18:17 Re: pg_dump and thousands of schemas