Quick Links

Re: Double sorting split patch

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Double sorting split patch
Date:	2011-10-04 09:46:43
Message-ID:	4E8AD603.80506@enterprisedb.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 04.10.2011 11:51, Alexander Korotkov wrote:
> On Tue, Oct 4, 2011 at 12:12 PM, Heikki Linnakangas<
> heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>
>> Can you elaborate the consider-split algorithm? The criteria to select the
>> new split over the previously selected one is this:
>>
>>> ! /*
>>> ! * If ratio is acceptable, we should compare current split
>>> with
>>> ! * previously selected one. If no split was selected then
>>> we select
>>> ! * current anyway. Between splits of one dimension we
>>> search for
>>> ! * minimal overlap (allowing negative values) and minimal
>>> ration
>>> ! * (between same overlaps. We switch dimension if find
>>> less overlap
>>> ! * (non-negative) or less range with same overlap.
>>> ! */
>>> ! range = diminfo->upper - diminfo->lower;
>>> ! overlap = ((leftUpper) - (rightLower)) / range;
>>> ! if (context->first ||
>>> ! (context->dim == dimNum&&
>>> ! (overlap< context->overlap ||
>>> ! (overlap == context->overlap&& ratio>
>>> context->ratio))) ||
>>> ! (context->dim != dimNum&&
>>> ! ((range> context->range&&
>>> ! non_negative(overlap)<=
>>> non_negative(context->overlap)**) ||
>>> ! non_negative(overlap)<
>>> non_negative(context->overlap)**))
>>> ! )
>>> ! {
>>>
>>
>> Why are negative overlaps handled differently across dimensions and within
>> the same dimension? Your considerSplit algorithm in the SYRCoSE 2011 paper
>> doesn't seem to make that distinction.
>
> Yes, I've changed this behaviour after experiments on real-life datasets. On
> the datasets where data don't overlap themselfs (points are always such
> datasets), non-overlapping splits are frequently possible. In this case
> splits tends to be along one dimension, because most distant non-overlapping
> splits (i.e. having lowest negative overlap) appears to be in the same
> dimension as previous split. Therefore MBRs appear to be very prolonged
> along another dimension, that's bad for search. In order to prevent this
> behavour I've made transition to another dimension by non-nagative part of
> overlap and range. Using range as the split criteria makes MBRs more
> quadratic, and using non-negative part of overlap as priority criteria give
> to range chance to matter.

Ok. Could you phrase that as a code comment?

Here's a version of the patch I've been working on. There's no
functional changes, just a lot of moving things around, comment changes,
etc. to hopefully make it more readable.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment	Content-Type	Size
double-sorting-split-0.4-heikki.patch	text/x-diff	23.2 KB

In response to

Re: Double sorting split patch at 2011-10-04 08:51:49 from Alexander Korotkov

Responses

Re: Double sorting split patch at 2011-10-04 12:10:40 from Alexander Korotkov

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Florian Pflug	2011-10-04 10:02:21	Re: restoring an object to a different name
Previous Message	Amit Khandekar	2011-10-04 09:09:44	Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.