Re: Fix for seg picksplit function

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Yeb Havinga <yebhavinga(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fix for seg picksplit function
Date: 2010-11-20 13:15:10
Message-ID: AANLkTi=cM_RQ47nPyVoj8x0pkX25zWFKzC6DdMPFBc9u@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Nov 20, 2010 at 6:46 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> Well, the problem with just comparing on < is that it takes very
> little account of the upper bounds. I think the cases where a simple
> split would hurt you the most are those where examining the upper
> bound is necessary to to get a good split.

Yes, also such asymmetric solution seems not beautiful enough for me :).
It's easy to sort segs by their center, in this case lower and upper bound
will be used equally. New patch is attached. I checked it on various data
distributions.

1) Uniform distribution
test=# insert into seg_test (select (a || ' .. ' || a + 0.00005*b)::seg from
(select random() as a, random() as b from generate_series(1,1000000)) x);
INSERT 0 1000000
Time: 79121,830 ms
test=# create index seg_test_idx on seg_test using gist (a);
CREATE INDEX
Time: 176409,434 ms
test=# explain (buffers, analyze) select * from seg_test where a @> '0.5 ..
0.5'::seg;
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on seg_test (cost=28.19..2500.32 rows=1000 width=12)
(actual time=0.251..0.886 rows=27 loops=1)
Recheck Cond: (a @> '0.5'::seg)
Buffers: shared hit=3 read=27
-> Bitmap Index Scan on seg_test_idx (cost=0.00..27.94 rows=1000
width=0) (actual time=0.193..0.193 rows=27 loops=1)
Index Cond: (a @> '0.5'::seg)
Buffers: shared hit=3
Total runtime: 1.091 ms
(7 rows)

Time: 41,884 ms

2) Natural distribution (Box–Muller transform was used for data generation)
test=# insert into seg_test (select ( a - 0.00005*abs(b) || ' .. ' || a +
0.00005*abs(b))::seg from (select
cos(2.0*pi()*random())*sqrt(-2.0*ln(random())) as a,
cos(2.0*pi()*random())*sqrt(-2.0*ln(random())) as b from
generate_series(1,1000000)) x);
INSERT 0 1000000
Time: 98614,305 ms
test=# create index seg_test_idx on seg_test using gist(a);
CREATE INDEX
Time: 212513,540 ms
test=# explain (buffers, analyze) select * from seg_test where a @> '0.3 ..
0.3'::seg;
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on seg_test (cost=28.18..2500.31 rows=1000 width=12)
(actual time=0.132..0.428 rows=27 loops=1)
Recheck Cond: (a @> '0.3'::seg)
Buffers: shared hit=3 read=27
-> Bitmap Index Scan on seg_test_idx (cost=0.00..27.93 rows=1000
width=0) (actual time=0.103..0.103 rows=27 loops=1)
Index Cond: (a @> '0.3'::seg)
Buffers: shared hit=3
Total runtime: 0.504 ms
(7 rows)

Time: 0,967 ms

3) Many distinct values
test=# insert into seg_test (select (a||'..'||(a+1))::seg from (select
(random()*13000)::integer as a from generate_series(1,1000000)) x);
INSERT 0 1000000
Time: 90775,952 ms
test=# create index seg_test_idx on seg_test using gist(a);
CREATE INDEX
Time: 200960,758 ms
test=# explain (buffers, analyze) select * from seg_test where a @> '700.0
.. 700.0'::seg;
QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on seg_test (cost=28.19..2500.33 rows=1000 width=12)
(actual time=0.358..3.531 rows=138 loops=1)
Recheck Cond: (a @> '700.0'::seg)
Buffers: shared hit=3 read=135
-> Bitmap Index Scan on seg_test_idx (cost=0.00..27.94 rows=1000
width=0) (actual time=0.270..0.270 rows=138 loops=1)
Index Cond: (a @> '700.0'::seg)
Buffers: shared hit=3
Total runtime: 3.882 ms
(7 rows)

Time: 5,271 ms

----
With best regards,
Alexander Korotkov.

Attachment Content-Type Size
seg_picksplit_fix-0.5.patch text/x-patch 7.1 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2010-11-20 15:38:32 Re: [PATCH] Custom code int(32|64) => text conversions out of performance reasons
Previous Message Yeb Havinga 2010-11-20 12:36:20 Re: Fix for seg picksplit function