Re: Double sorting split patch

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Double sorting split patch
Date: 2011-10-04 09:46:43
Message-ID: 4E8AD603.80506@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 04.10.2011 11:51, Alexander Korotkov wrote:
> On Tue, Oct 4, 2011 at 12:12 PM, Heikki Linnakangas<
> heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>
>> Can you elaborate the consider-split algorithm? The criteria to select the
>> new split over the previously selected one is this:
>>
>>> ! /*
>>> ! * If ratio is acceptable, we should compare current split
>>> with
>>> ! * previously selected one. If no split was selected then
>>> we select
>>> ! * current anyway. Between splits of one dimension we
>>> search for
>>> ! * minimal overlap (allowing negative values) and minimal
>>> ration
>>> ! * (between same overlaps. We switch dimension if find
>>> less overlap
>>> ! * (non-negative) or less range with same overlap.
>>> ! */
>>> ! range = diminfo->upper - diminfo->lower;
>>> ! overlap = ((leftUpper) - (rightLower)) / range;
>>> ! if (context->first ||
>>> ! (context->dim == dimNum&&
>>> ! (overlap< context->overlap ||
>>> ! (overlap == context->overlap&& ratio>
>>> context->ratio))) ||
>>> ! (context->dim != dimNum&&
>>> ! ((range> context->range&&
>>> ! non_negative(overlap)<=
>>> non_negative(context->overlap)**) ||
>>> ! non_negative(overlap)<
>>> non_negative(context->overlap)**))
>>> ! )
>>> ! {
>>>
>>
>> Why are negative overlaps handled differently across dimensions and within
>> the same dimension? Your considerSplit algorithm in the SYRCoSE 2011 paper
>> doesn't seem to make that distinction.
>
> Yes, I've changed this behaviour after experiments on real-life datasets. On
> the datasets where data don't overlap themselfs (points are always such
> datasets), non-overlapping splits are frequently possible. In this case
> splits tends to be along one dimension, because most distant non-overlapping
> splits (i.e. having lowest negative overlap) appears to be in the same
> dimension as previous split. Therefore MBRs appear to be very prolonged
> along another dimension, that's bad for search. In order to prevent this
> behavour I've made transition to another dimension by non-nagative part of
> overlap and range. Using range as the split criteria makes MBRs more
> quadratic, and using non-negative part of overlap as priority criteria give
> to range chance to matter.

Ok. Could you phrase that as a code comment?

Here's a version of the patch I've been working on. There's no
functional changes, just a lot of moving things around, comment changes,
etc. to hopefully make it more readable.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
double-sorting-split-0.4-heikki.patch text/x-diff 23.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Florian Pflug 2011-10-04 10:02:21 Re: restoring an object to a different name
Previous Message Amit Khandekar 2011-10-04 09:09:44 Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.