Re: Range Types, constructors, and the type system

From: Darren Duncan <darren(at)darrenduncan(dot)net>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-06-26 07:57:55
Message-ID: 4E06E683.2070009@darrenduncan.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

To eludicate my earlier comments on this subject ...

I believe that the best general solution here is for every ordered base type to
just have a single total order, which is always used with that type in any
generic order-sensitive operation, including any ranges defined over it, or any
ORDER BY or any <,>,etc. The built-in generic text type would have exactly 1
system-defined collation that can't be changed, and it would be something simple
and generic, such as simply sorting on the codepoint as integers.

When we want to have some other "native" ordering for an existing type, such as
when we want to use a different text collation, we do this by creating a *new
base type*, using CREATE TYPE or some shorthand thereof, and this new type
defines its own ordering, such as that a particular text collation is used.

For example:

CREATE TYPE text__en_US AS (v text);

CREATE TYPE text__C AS (v text);

These will not compare equal to each other or to text, and that is good, because
having a different text collation implies that we consider 'foo'::text__en_US
and 'foo'::text__C to be different values.

I believe that any other approach is worse, and in particular I believe that
creating DOMAIN over text is worse, because DOMAIN are supposed to be subtypes
of some other type, whose set of member values is a subset of the other type's
values, and that have the same ordering. Multiple CREATE type over the same
base type don't interfere with each other like multiple DOMAIN could.

Assuming that what CREATE TYPE produces is actually a base type, I believe there
is no better solution using the facilities that SQL provides.

If there is concern about performance related to CREATE TYPE being a composite
type, I'm sure it is possible to engineer an optimization for when the type has
just 1 attribute so that performance isn't an issue. The main point I'm trying
to raise here is about semantics and good type systems.

Likewise, don't let concern about syntax for using values of such composite
types. Once again, there can be shorthands if necessary.

In fact, Postgres could provide a general shorthand for creating a composite
type of 1 attribute whose purpose is to make one type that is like but unequal
to another, and using this shorthand could also cause the composite type to
overload/polymorph all the operators of it's attribute type, so that the syntax
to define one is very short.

For example:

CREATE TYPE text__en_US WRAPS text COLLATE en_US;

... and I assume the name of that attribute would just be system-defined.

Note that the above is specific to wrapping text, and the COLLATE is just
shorthand for defining an ordering function for text__en_US. A more general
form could be:

CREATE TYPE bar WRAPS foo ORDER USING FUNCTION baz (lhs foo, rhs foo) ...;

And then we can say:

RANGE OF text__en_US

RANGE OF text

... similarly to how we declare array types with ARRAY.

One can also just define range values as they do array values, such as like this:

range('foo','bar') # default collation

range('foo'::text__en_US, 'bar'::text__en_US) # en_us collation

If that seems verbose, I have a few words for you:

1. Users should in practice name their wrapper types over their intended
meaning, not their mechanics, such as like this (not using text for variety),
and that may be more terse:

CREATE TYPE acct_num WRAPS integer; # inherits integer ordering by default

2. If the wrapper types overload the base operators, either automatically or
selectively (does it make sense to multiply an acct_num?), one doesn't have to
keep unpacking and packing them to use them in most cases. For example, I'd
expect many text wrappers to polymorph catenation or substring etc, so no extra
syntax.

3. In practice, most literal values come from applications and are given to SQL
code either as function parameters or bind parameter arguments. While lots of
example code may have literal values in it, I would think that most real-work
code would hardly have any, and hence you'd rarely see any 'foo'::text__en_US
for example. You'd more likely see the less common var::text__en_US or such.

So that's my position, CREATE TYPE on the regular types or the like is the best
solution, and anything else is an inferior solution.

Such a design is also how I do collations and ranges in my Muldis D language.

-- Darren Duncan

Jeff Davis wrote:
> Different ranges over the same subtype make sense when using different
> total orders for the subtype. This is most apparent with text collation,
> but makes sense (at least mathematically, if not practically) for any
> subtype.
>
> For instance:
> [a, Z)
> is a valid range in "en_US", but not in "C", so it makes sense to have
> multiple ranges over the same subtype with different collations.
>
> But what if you have a function (like a constructor), of the form:
> (anyelement, anyelement) -> anyrange
> ? To work with the type system, you need to be able to figure out the
> return type from the arguments; which means to support functions like
> this we need a mapping from the subtype to the range type.
> Unfortunately, that restricts us to one range type per subtype (this
> isn't a problem for ARRAYs, because there is only one useful array type
> for a given element type).
>
> This problem first came up a while ago:
> http://archives.postgresql.org/pgsql-hackers/2011-01/msg02788.php
>
> My workaround was to use domains, but that's not a very clean solution
> (you have to add a bunch of casts to make sure the right domain is
> chosen). It became entirely unworkable with collations, because people
> would be using different text collations a lot more frequently than,
> say, a different ordering for timestamptz. Tom mentioned that here:
>
> http://archives.postgresql.org/message-id/24831.1308579443@sss.pgh.pa.us
>
> I think Florian proposed the most promising line of attack here:
>
> http://archives.postgresql.org/message-id/AD4FC75D-DB99-48ED-9082-52EE3A4D74A6@phlo.org
>
> by suggesting that functions of the form:
> (anyelement, [other non-anyrange arguments]) -> anyrange
> might be expendable. After all, they are only useful for constructors as
> far as we can tell. Other range functions will have an anyrange
> parameter, and we can use the actual type of the argument to know the
> range type (as well as the subtype).
>
> Although it's very nice to be able to say:
> range(1,10)
> and get an int4range out of it, it's not the only way, and it's not
> without its problems anyway. For instance, to get an int8range you have
> to do:
> range(1::int8, 10::int8)
> or similar.
>
> So, we could just make functions like:
> int4range(int4, int4)
> int8range(int8, int8)
> ...
> when creating the range type, and it would actually be a usability
> improvement.
>
> There are at least a few constructors that would need to be made for
> each rangetype: the constructor above, the singleton constructor,
> constructors that have infinite bounds, the empty constructor, and all
> of the permutations for inclusivity/exclusivity. That adds up to quite a
> few catalog entries per range type.
>
> We could reduce some of the permutations by using extra arguments
> somehow, but that seems like it adds to the ugliness. This might also be
> a time to revisit whether there is a better way to present all of these
> constructors (rather than the _[co][co] suffixes to names, etc.).
>
> Even if we're willing to put up with a bunch of catalog entries, it will
> take a little creativity to figure out how to run the functions
> generically from a fixed set of C functions.
>
> Are there other thoughts or ideas about eliminating the need for generic
> constructors like range()?
>
> Another idea Florian suggested (in the same email) was the ability to
> declare the return type of a function, and then use the declared type to
> infer the argument types. That would be nice because you would just have
> to do:
> range(1,10)::int8range
> However, that's kind of backwards from how our type inference system
> works now, and sounds like a big change.
>
> Maybe we could consider a syntax addition for constructing range values?
> That was kicked around briefly, but perhaps we should revisit the
> possibilities there.
>
> Regards,
> Jeff Davis
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2011-06-26 08:58:12 proposal: global temp tables
Previous Message Jeff Davis 2011-06-26 07:18:52 Range Types and length function