Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2011-11-27 19:11:28
Message-ID: CAPpHfdvkC23+hTzw+E7LwG23r0dbCo-zT8zeYa47ycbo-XmqCw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Nov 27, 2011 at 10:43 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> 1. penalty is using both hard-wired penalty values (1.0, 2.0, etc) and
> values obtained from subtype_diff. This is not good, because you have
> no idea what scale the subtype differences will be expressed on. The
> hard-wired values could be greatly larger than range widths, or greatly
> smaller, resulting in randomly different index behavior.
>
Current GiST code only compare penalty values of inserting same tuple. And
don't see why it may alters. So, values obtained from subtype_diff
and hard-wired values would be never compared each other.

> 2. It's too large/complicated. You're proposing to add nearly a
> thousand lines to rangetypes_gist.c, and I do not see any reason to
> think that this is so much better than what's there now as to justify
> that kind of increment in the code size. I saw your performance
> results, but one set of results on an arbitrary (not-real-world) test
> case doesn't prove a lot to me; and in particular it doesn't prove that
> we couldn't do as well with a much smaller and simpler patch.
>
I've tested double sorting split algorithm itself pretty much on synthetic
datasets. See paper for details. Strategy of separation of different
classes of ranges really need more testing. But obtaining large enough
real-life datasets is pretty *problematic for me.*

> There are a lot of garden-variety coding problems, too, for instance here:
>
> + *penalty = Max(DatumGetFloat8(FunctionCall2(
> + subtype_diff, orig_lower.val, new_lower.val)), 0.0);
>
> which is going to uselessly call the subtype_diff function twice most of
> the time (Max() is only a macro), plus you left off the collation
> argument. But I don't think it's worth worrying about those until the
> big picture is correct, which I feel it isn't yet.
>
Oh, I see. It will be fixed.

I propose to pull out and apply the changes related to the
> RANGE_CONTAIN_EMPTY flag, and also remove the <> opclass entry,
> because I think these are uncontroversial and in the nature of
> "must fix quickly". The redesign of the penalty and picksplit
> functions should be discussed separately.
>
I think the same.

------
With best regards,
Alexander Korotkov.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2011-11-27 19:18:31 Re: Feature proposal: www_fdw
Previous Message Peter Eisentraut 2011-11-27 18:47:21 information schema/aclexplode doesn't know about default privileges