Re: Prefered Types

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Зотов Роман <zotov(at)oe-it(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Prefered Types
Date: 2011-05-04 19:41:46
Message-ID: BANLkTimFgyKn6CRo82WDpTxSFxQ9iFSPRw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 3, 2011 at 3:06 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I wrote:
>> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
>>> The interesting discussion is what happens next.  To me, this is all
>>> related to this previous discussion:
>>> http://archives.postgresql.org/pgsql-hackers/2010-09/msg00232.php
>
>> Yeah, there doesn't seem like much point unless we have a clear idea
>> what we're going to do with the change.
>
> BTW, it occurs to me to wonder whether, instead of making types be more
> or less preferred, we should attack the issue from a different direction
> and assign preferred-ness ratings to casts.  That seems to be more or
> less the direction that Robert was considering in the above-linked
> thread.  I'm not sure it's better than putting the ratings on types ---
> in particular, neither viewpoint seems to offer a really clean answer
> about what to do when trying to resolve a multiple-argument function
> in which one possible resolution offers a more-preferred conversion for
> one argument but a less-preferred conversion for another one.  But it's
> an alternative we ought to think about before betting all the chips on
> generalizing typispreferred.
>
> Personally I've always felt that the typispreferred mechanism was a bit
> of a wart; changing it from a bool to an int won't improve that, it'll
> just make it a more complicated wart.  Casts have already got a
> standards-blessed notion that some are more equal than others, so
> maybe attaching preferredness ratings to them will be less of a wart.
> Not sure about it though.

I think this is a pretty good analysis. One of the big, fat problems
with typispreferred is that it totally falls apart when more than two
types are involved. For example, given a call f(int2), we can't
decide between f(int4) and f(int8), but it seems pretty clear (to me,
at least) that we should prefer to promote as little as possible and
should therefore pick f(int4). The problem is less acute with
string-like data types because there are only two typcategory-S data
types that get much use: text and varchar. But add a third type to
the mix (varchar2...) or start playing around with functions that are
defined for name and bpchar but not text or some such thing, and
things get sticky.

Generalizing typispreferred to an integer definitely helps with these
cases, assuming anyway that you are dealing mostly with built-in
types, or that the extensions you are using can somehow agree among
themselves on reasonable weighting values. But it is not a perfect
solution either, because it can really only handle pretty linear
topologies. It's reasonable to suppose that the integer types are
ordered int2 - int4 - int8 - numeric and that the floating point types
are ordered float4 - float8 (- numeric?), but I think the two
hierarchies are pretty much incomparable, and an integer
typispreferred won't handle that very well, unless we make the two
groups separate categories, but arguably numeric belongs in both
groups so that doesn't really seem to work very well either.
Certainly from a theoretical perspective there's no reason why you
couldn't have A - B - X and C - D - X, with A-C, A-D, B-C, and B-D
incomparable. It almost feels like you need a graph to model it
properly, which perhaps argues for your idea of attaching weights to
the casts.

But there are some problems with that, too. In particular, it would
be nice to be able to "hook in" new types with a minimum of fuss. For
example, say we add a new string type, like citext, via an extension.
Right now, we need to add casts not only from citext to text, but also
from citext to all the things to which text has casts, if we really
want citext to behave like text. That solution works OK for the first
extension type we load in, but as soon as you add any nonstandard
casts from text to other things (perhaps yet another extension type of
some kind), it starts to get a bit leaky. In some sense it feels like
it'd be nice to be able to "walk the graph" - if an implicit cast from
A to B is OK, and an implicit cast from B to C is OK, perhaps an
implicit cast from A to C is also OK. But that seems awfully
expensive to do at runtime, and it'd introduce some strange behavior
particularly with the way we have the reg* -> oid and oid -> reg*
casts set up.

select a.castsource::regtype, a.casttarget::regtype,
b.casttarget::regtype from pg_cast a, pg_cast b where a.casttarget =
b.castsource and a.castcontext = 'i' and b.castcontext = 'i' and not
exists (select 1 from pg_cast x where x.castsource = a.castsource and
x.casttarget = b.casttarget and x.castcontext = 'i') and a.castsource
<> b.casttarget;

It's not clear to me whether in any of this there is a solution to the
problem of int2 being a second-class citizen. Perhaps we could add
casts from int4 and int8 back to int2, and make it less-preferred than
all of the other integer types, but I'm not sure what else that would
break.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Scott Marlowe 2011-05-04 19:50:50 Re: 'SGT DETAIL: Could not open file "pg_clog/05DC": No such file or directory' - what to do now?
Previous Message Magnus Hagander 2011-05-04 19:16:42 Re: [COMMITTERS] pgsql: Message style cleanup