Re: Fix for seg picksplit function

From: Yeb Havinga <yebhavinga(at)gmail(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fix for seg picksplit function
Date: 2010-11-05 15:53:04
Message-ID: 4CD42860.4070706@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Alexander,

Here follows a review of your patch.
> Hackers,
>
> Seg contrib module contains the same bug in picksplit function as
> cube contrib module.
Good catch! :-)
> Also, Guttman's split algorithm is not needed in unidimensional case,
> because sorting based algorithm is good in this case.
I had some doubts whether this is true in the general case, instead of
the given example. I increased the interval width in your example to
0.25*b instead of 0.00005*b, with the purpose to increase overlaps
between intervals. Though the performance gain was less, it was still
faster than Guttmans algorithm. To make things worse I also tested with
an interval with of 1*b, resulting in a lot of overlaps and compared
several overlap queries. The sorting algorithm was 25% to 40% faster on
searches. Index creation time with the sorting algorithm is also a
fraction of the original creation time.

Since this testing could be part of a review, I looked at the code as
well and listed myself as reviewer on the commitfest.

Comparing with gbt_num_picksplit reveals some differences with sort
array intialization and size, the former's sort array starts at index 1
(FirstOffsetNumber), your implementation starts at 0 for sorting and
hence the size of the sorting array can be one element less. I prefer
your way of sort array initialization; gbt_num_pickplits's use of
FirstOffsetNumber of the qsort array seems to mix a define from the
gist/btree namespace for no reason and might even lead to confusion.

The remaining part of the new picksplit function puts the segs into left
or right, I think the code is easier to understand if there was only one
for loop from i=1 to 1 < maxoff, for the current code I had to verify
that all sort array entries were really used with the two seperate loops
that also skipped the first value. I edited the code a bit, and also
used seg_union to initialize/palloc the datum values. Finally, waste and
firsttime variables were initialized but not used anymore, so removed.

Attached is a revised patch.

regards,
Yeb Havinga

PS: when comparing with gbt_num_picksplit, I noticed that that one does
not update v->spl_ldatum and spl_rdatum to the union datums, but
initializes these to 0 at the beginning and never seems to update them.
Not sure if this is a problem since the num_picksplit stuff seems to
work well.

Attachment Content-Type Size
seg_picksplit_fix-0.2.patch text/x-patch 5.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2010-11-05 16:03:38 Re: ALTER OBJECT any_name SET SCHEMA name
Previous Message Tom Lane 2010-11-05 15:44:14 Re: ALTER TABLE ... IF EXISTS feature?