Skip site navigation (1) Skip section navigation (2)

Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2011-12-20 09:22:41
Message-ID: CAPpHfduBWP15Ufc6F_W=+VYxQmBdSdXbjz3EygM-Uc-5i60W8Q@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-hackers
Hi!

Studying this question little more I found that current approach of range
indexing can be dramatically inefficient in some cases. It's not because of
penalty or split implementation, but because of approach itself. Mapping
intervals to two-dimensional space produce much better results in case of
high-overlapping ranges and "@>", "<@" operators with low selectivity.

There is a simple test case for proof of concept.

create table source as (select l, (l + s) as r from (select
(random()*10000)::int as l, (random()*1000 + 1)::int s from
generate_series(1,1000000) g) x);
create table range_test as (select int4range(l,r) as x from source);
create table point_test as (select point(l,r) as x from source);
create index range_test_idx on range_test using gist (x);
create index point_test_idx on point_test using gist (x);

test=# explain (analyze, buffers) select * from range_test where x <@
int4range(5000,5010);
                                                         QUERY PLAN


-----------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on range_test  (cost=40.31..2585.65 rows=1000 width=32)
(actual time=37.304..37.310 rows=2 loops=1)
   Recheck Cond: (x <@ '[5000,5010)'::int4range)
   Buffers: shared hit=767
   ->  Bitmap Index Scan on range_test_idx  (cost=0.00..40.06 rows=1000
width=0) (actual time=37.288..37.288 rows=2 loops=1)
         Index Cond: (x <@ '[5000,5010)'::int4range)
         Buffers: shared hit=765
 Total runtime: 37.385 ms
(7 rows)


test=# explain (analyze, buffers) select * from point_test where x <@
box(point(5000,5000),point(5010,5010));
                                                        QUERY PLAN


---------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on point_test  (cost=44.36..2589.69 rows=1000 width=16)
(actual time=0.197..0.206 rows=2 loops=1)
   Recheck Cond: (x <@ '(5010,5010),(5000,5000)'::box)
   Buffers: shared hit=5
   ->  Bitmap Index Scan on point_test_idx  (cost=0.00..44.11 rows=1000
width=0) (actual time=0.182..0.182 rows=2 loops=1)
         Index Cond: (x <@ '(5010,5010),(5000,5000)'::box)
         Buffers: shared hit=3
 Total runtime: 0.265 ms
(7 rows)

test=# explain (analyze, buffers) select * from range_test where x @>
int4range(5000,5990);
                                                        QUERY PLAN


---------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on range_test  (cost=40.31..2585.65 rows=1000 width=32)
(actual time=4.578..4.603
rows=5 loops=1)
   Recheck Cond: (x @> '[5000,5990)'::int4range)
   Buffers: shared hit=52
   ->  Bitmap Index Scan on range_test_idx  (cost=0.00..40.06 rows=1000
width=0) (actual time=4.561..4.561 rows=5 loops=1)
         Index Cond: (x @> '[5000,5990)'::int4range)
         Buffers: shared hit=47
 Total runtime: 4.669 ms
(7 rows)


test=# explain (analyze, buffers) select * from point_test where x <@
box(point('-inf'::float,5990),point(5000,'+inf'::float));
                                                        QUERY PLAN


---------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on point_test  (cost=44.36..2589.69 rows=1000 width=16)
(actual time=0.328..0.353 rows=5 loops=1)
   Recheck Cond: (x <@ '(5000,inf),(-inf,5990)'::box)
   Buffers: shared hit=8
   ->  Bitmap Index Scan on point_test_idx  (cost=0.00..44.11 rows=1000
width=0) (actual time=0.312..0.312 rows=5 loops=1)
         Index Cond: (x <@ '(5000,inf),(-inf,5990)'::box)
         Buffers: shared hit=3
 Total runtime: 0.419 ms
(7 rows)

If you like to learn more information about such mapping you can start from
here: http://www.comsis.org/ComSIS/Vol7No4/RegularPapers/paper16.pdf

Any thoughts?

-----
With best regards,
Alexander Korotkov.

In response to

Responses

pgsql-hackers by date

Next:From: Magnus HaganderDate: 2011-12-20 10:13:27
Subject: Re: JSON for PG 9.2
Previous:From: Nikhil SontakkeDate: 2011-12-20 06:14:29
Subject: Re: Review: Non-inheritable check constraints

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group