Re: Range Types - typo + NULL string constructor

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types - typo + NULL string constructor
Date: 2011-10-16 22:09:13
Message-ID: 1318802953.10757.28.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 2011-10-15 at 01:46 +0300, Heikki Linnakangas wrote:
> On 07.10.2011 06:41, Jeff Davis wrote:
> > The repo is available here:
> >
> > http://git.postgresql.org/gitweb/?p=users/jdavis/postgres.git;a=log;h=refs/heads/rangetypes
>
> I took a look at this, fixed a bunch of minor issues, and pushed to my
> git repo: git://git.postgresql.org/git/users/heikki/postgres.git.
> Attached is an updated patch including those changes, for the archives.

Thank you for the review!

Updated patch attached (and pushed to repo).

> A few issues that I didn't fix straight away:
>
> * Why do you need to be a superuser to create a range type? In
> DefineRange, the comment explaining why superuser privilege is required
> seems bogus, copy-pasted from base types. So what is the reason?

No particular reason. I thought someone might ask me to include such as
restriction. I have removed it to allow non-superuser creation.

> * The binary i/o format includes the length of the lower and upper
> bounds twice, once explicitly in range_send, and second time within the
> send-function of the subtype. Seems wasteful.

Any ideas how to fix that? How else do I know how much the underlying
send function will consume?

> * In findRangeSubOpclass, I think it's OK to hard-code "btree" into the
> error message, no need to look that up from pg_am.

Done.

> * Do we really need non_empty(anyrange) ? You can just do "NOT empty(x)"

To make it a searchable (via GiST) condition, I need an operator. I
could either remove that operator (as it's not amazingly useful), or I
could just not document the function but leave the operator there.

> * Range_before and range_after: I think "input range is empty" would
> sound better than "undefined for empty ranges".

Done.

> * In range_hash, rotating the hash of the initial flags byte seems pointless

Fixed.

> * range_constructor_internal - I think it would be better to move logic
> to figure out the the arguments into the callers.

Done.

> * The gist support functions frequently call range_deserialize(), which
> does catalog lookups. Isn't that horrendously expensive?

Yes, it was. I have introduced a cached structure that avoids syscache
lookups when it's the same range as the last lookup (the common case).

> * In range_serialize and range_deserialize, reduce the number of catalog
> lookups by combining the get_range_subtype, get_typlen, get_typalign,
> get_typbyval and get_typstorage calls
> with just one catalog lookup, and fetch all those fields at once.
> Something like get_typlenbyvalalign()

Done in conjunction with the cache structure I mentioned above.

> * Have you tested this on an architecture with strict alignment? I don't
> see any alignment bugs, but I think there's a lot of potential for them..

I don't have such an architecture readily available.

> Documentation
> -------------
> * Section "Built-in Range Types". How about "PostgreSQL comes with the
> following built-in range types:" <examples> "In addition, you can define
> your own"

Done.

> * In section "Inclusive and Exclusive Bounds", it is said "An inclusive
> lower bound is represented by ...". That begs the question: where is it
> represented like that? That doesn't become clear until you read the
> section on Input/Output format. Reordering the sections might help. Are
> the double-quotes around [ and ( necessary?

I added a forward reference. Rearranging the sections seemed to create
the opposite problem. And I removed the extra double-quotes.

> * It would be good to have some examples on the input formats in the
> Input/Output section. There's one in the Examples section, but it's not
> very clear that it's related. Perhaps the whole Examples section should
> be moved to the end, or the examples be dispersed into the other sections.

Done.

> * Likewise it would be nice to have some examples in the Inclusive and
> Exclusive Bounds section.

I think the new examples in the I/O section cover that, and there's a
forward link from the inclusivity/exclusivity section.

> * subtype_float, how is it supposed to work? I couldn't find any
> explanation in the docs. Can we come up with a better name for the function?

It's only for the gist penalty function. It's difficult to know how to
penalize without being able to do some math on the boundaries, so
subtype_float it meant to return a float that might be useful for that
purpose.

It's likely that this will change based on Alexander Korotkov's comments
here:

http://archives.postgresql.org/message-id/CAPpHfdsVow
+Yeoq5Be=sxPFPsbYG3Gj3QyVvVdt8dg7XqNyHpg(at)mail(dot)gmail(dot)com

So I will defer the documentation updates until we work that out.

> * Constructing Ranges section: should describe the 0-3 argument forms of
> the constructors, not only in the examples but in the main text too.

Done.

> * What exactly is canonical function supposed to return? It's not clear
> what format one should choose as the canonical format when writing a
> custom range type. And even for the built-in types, it would be good to
> explain which types use which canonical format (I saw the discussions on
> that, so I understand that might still be subject to change).

The canonical function is just supposed to return a new range such that
two equal values will have equal representations. I have listed the
built-in discrete range types and their canonical form.

As far as describing what a custom range type is supposed to use for the
canonical form, I don't know which part is exactly unclear. There aren't
too many rules to defining one -- the only guideline is that ranges of
equal value going in should be put in one canonical representation.

> * Defining New RangeTypes section: A more general description in
> addition to the example would be good.

Done.

> * "GiST Indexing" title: how about making it just "Indexing". Also, it
> would be nice to list exactly which operators can be sped up by gist.

Done.

Regards,
Jeff Davis

Attachment Content-Type Size
rangetypes-20111016.gz application/x-gzip 48.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-10-16 22:14:03 Re: Underspecified window queries in regression tests
Previous Message Tom Lane 2011-10-16 22:03:13 Re: Underspecified window queries in regression tests