Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2011-12-22 07:46:46
Message-ID: 1324540006.7608.79.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 2011-12-14 at 01:04 +0400, Alexander Korotkov wrote:
> Hi!
>
Thank you! Attached a few changes:

* Change "ordinal" to "normal" for clarity (at least to me).
* Some comment cleanup
* Change classes_groups to be an enum of SPLIT_LEFT and SPLIT_RIGHT,
rather than using 1 and 2.
* Changed the "bounds_lower" and "bounds_upper" variables into
"by_lower" and "by_upper" to indicate that the arrays are distinguished
by sort order. It was confusing me to read it otherwise.

A few comments:

* In range_gist_picksplit, it would be nice to have a little bit more
intuitive description of what's going on with the nonEmptyCount and
nonInfCount numbers. For instance, it appears to depend on the fact that
a range must either be in nonEmptyCount or in nonInfCount. Also, can you
describe the reason you're multiplying by two and taking the absolute
value? It seems to work, but I am missing the intuition behind those
operations.

* The penalty function is fairly hard to read still. At a high level, I
think we're trying to accomplish a few things (in order from most to
least important):
- Keep normal ranges separate.
- Avoid broadening the class of the original predicate (e.g. turning
single-infinite into double-infinite).
- Avoid broadening (as determined by subtype_diff) the original
predicate.
- Favor adding ranges to narrower original predicates.

Do you agree? If so, perhaps we can attack those more directly and it
might be a little more readable.

Additionally, the arbitrary numbers might become a problem. Can we
choose better constants there? They would still be arbitrary when
compared with real numbers derived from subtype_diff, but maybe we can
still do better than what's there.

* Regarding the leftover "common" entries that can go to either side:
what is the "delta" measure trying to accomplish? When I work through
some examples, it seems to favor putting larger common ranges on the
left (small delta) and smaller common ranges on the right (smaller
delta). Why is that good? Or did I misread the code?

Intuitively, I would think that we'd want to push the ranges with lower
upper bounds to the left and higher lower bounds to the right -- in
other words, recurse. Obviously, we'd need to make sure it terminated at
some point, but splitting the common entries does seem like a smaller
version of the original problem. Thoughts?

Thank you for the helpful comments! It took me a while to work through
the logic, but I would have been lost completely without the comments
around the double sorting split.

Regards,
Jeff Davis

Attachment Content-Type Size
range_gist_changes.diff text/x-patch 22.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2011-12-22 07:52:18 Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Previous Message Heikki Linnakangas 2011-12-22 07:44:03 Re: Page Checksums + Double Writes