Re: Double sorting split patch

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Double sorting split patch
Date: 2011-10-04 08:51:49
Message-ID: CAPpHfds333E_V28nRkDY2CBSDC+TDHykZ1HBdPpYkGAjb9w6dg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 4, 2011 at 12:12 PM, Heikki Linnakangas <
heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

> Can you elaborate the consider-split algorithm? The criteria to select the
> new split over the previously selected one is this:
>
>> ! /*
>> ! * If ratio is acceptable, we should compare current split
>> with
>> ! * previously selected one. If no split was selected then
>> we select
>> ! * current anyway. Between splits of one dimension we
>> search for
>> ! * minimal overlap (allowing negative values) and minimal
>> ration
>> ! * (between same overlaps. We switch dimension if find
>> less overlap
>> ! * (non-negative) or less range with same overlap.
>> ! */
>> ! range = diminfo->upper - diminfo->lower;
>> ! overlap = ((leftUpper) - (rightLower)) / range;
>> ! if (context->first ||
>> ! (context->dim == dimNum &&
>> ! (overlap < context->overlap ||
>> ! (overlap == context->overlap && ratio >
>> context->ratio))) ||
>> ! (context->dim != dimNum &&
>> ! ((range > context->range &&
>> ! non_negative(overlap) <=
>> non_negative(context->overlap)**) ||
>> ! non_negative(overlap) <
>> non_negative(context->overlap)**))
>> ! )
>> ! {
>>
>
> Why are negative overlaps handled differently across dimensions and within
> the same dimension? Your considerSplit algorithm in the SYRCoSE 2011 paper
> doesn't seem to make that distinction.

Yes, I've changed this behaviour after experiments on real-life datasets. On
the datasets where data don't overlap themselfs (points are always such
datasets), non-overlapping splits are frequently possible. In this case
splits tends to be along one dimension, because most distant non-overlapping
splits (i.e. having lowest negative overlap) appears to be in the same
dimension as previous split. Therefore MBRs appear to be very prolonged
along another dimension, that's bad for search. In order to prevent this
behavour I've made transition to another dimension by non-nagative part of
overlap and range. Using range as the split criteria makes MBRs more
quadratic, and using non-negative part of overlap as priority criteria give
to range chance to matter.

------
With best regards,
Alexander Korotkov.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2011-10-04 09:05:59 Re: Separating bgwriter and checkpointer
Previous Message Alex Hunsaker 2011-10-04 08:34:14 Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.