Proposal for resolving casting issues

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Proposal for resolving casting issues
Date: 2002-09-15 17:09:07
Message-ID: 29539.1032109747@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

We've been discussing this stuff in fits and starts for months now, but
nothing satisfactory has been arrived at. I've concluded that part of
the problem is that we are trying to force the system's behavior into
a model that is too limiting: we need more than an implicit/explicit cast
distinction. Accordingly, I suggest we bite the bullet and make it happen.
(Note that I've resigned myself to having to do an initdb for 7.3beta2.)

I think we must extend pg_cast's castimplicit column to a three-way value:
* okay as implicit cast in expression (or in assignment)
* okay as implicit cast in assignment only
* okay only as explicit cast

"In expression" refers to cases where we have (or potentially have) multiple
possible interpretations; essentially, anytime a value is being fed to a
function or operator, there can be ambiguity due to overloading, and so we
need to restrict the set of possible implicit casts to limit ambiguity and
ensure a reasonable choice of function is made.

"In assignment only" actually means any case where the destination datatype
is known with certainty. For example CoerceTargetExpr is currently used to
coerce an array subscript expression to integer, and I think it's okay to
treat that context like store assignment.

Question: what shall we call these alternatives in CREATE CAST? The SQL99
phrase AS ASSIGNMENT looks like it should mean the second, but I think
the spec semantics require it to mean the first. Ugh. Perhaps AS
ASSIGNMENT ONLY for the second case?

Also, I think we should allow cast functions to take an optional boolean
second argument "isExplicit", so that explicit casts can be distinguished
from implicit at runtime. We'll use this to get spec-compliant semantics
for char/varchar truncation (there shouldn't be an error if you explicitly
cast to a shorter length).

We'll need to add fields to Func and RelabelType nodes so that we can tell
whether a node was generated due to an explicit function call, implicit
cast, or explicit cast; we'll use these for better reverse-listing. (In
particular this will let us hide the boolean second argument from being
reverse-listed, when present.)

Now, as to just what to do with it --- Peter posted a list of questions
awhile back that weren't ever resolved, but I think we can make some
progress with this scheme in mind:

> From looking at the set of implicit or not casts, I think there are two
> major issues to discuss:
>
> 1. Should truncating/rounding casts be implicit? (e.g., float4 -> int4)
>
> I think there's a good argument for "no", but for some reason SQL99 says
> "yes", at least for the family of numerical types.

We can make this work cleanly if "down" casts are assignment-only while
"up" casts are fully implicit. I think that the spec requires implicit
casting only in the context of store assignment.

> 2. Should casts from non-character types to text be implicit? (e.g., date
> -> text)
>
> I think this should be "no", for the same reason that the other direction
> is already disallowed. It's just sloppy programming.

I agree with this in principle, but in practice we probably have to allow
implicit casts to text, at least for awhile yet. Seems that too many
people depend on stuff like
SELECT 'Meeting time is ' || timestamp_var
Since this is an expression context we don't get any help from the notion
of store assignment :-(

> I also have a few individual cases that look worthy of consideration:
>
> abstime <-> int4: I think these should not be implicit because they
> represent different "kinds" of data. (These are binary compatible casts,
> so changing them to not implicit probably won't have any effect. I'd have
> to check this.)

I believe that as of current sources we can mark a binary cast non-implicit,
and I agree with marking these two explicit-only.

> date -> timestamp[tz]: I'm suspicious of this one, but it's hard to
> explain. The definition to fill in the time component with zeros is
> reasonable, but it's not the same thing as casting integers to floats
> because dates really represent a time span of 24 hours and timestamps an
> indivisible point in time. I suggest making this non-implicit, for
> conformance with SQL and for general consistency between the date/time
> types.

I disagree here; promoting date to timestamp seems perfectly reasonable,
and I think it's something a lot of people rely on.

> time -> interval: I'm not even sure this cast should exist at all.
> Proper arithmetic would be IntervalValue = TimeValue - TIME 'midnight'.
> At least make it non-implicit.

I'd go along with marking it assignment-only.

> timestamp -> abstime: This can be implicit AFAICS.

This is lossy (abstime doesn't preserve fractional seconds) so I'd vote
for making it assignment-only.

In a later message Peter wrote:

> Since almost every cast to "text" is implicit, then I believe so should
> inet -> text
> macaddr -> text
> int4 -> varchar
> int8 -> varchar
> which are currently not.

I'd go along with making the inet->text and macaddr->text cases implicit,
since as you note all the other casts to text are. However, those two
casts to varchar must not be implicit (or at most assignment-only) else
they will create ambiguity against the implicit casts to text for the same
source datatype.

In summary: I haven't yet gone through the existing casts in detail, but
I propose the following general rules for deciding how to mark casts:

* Casts across datatype categories should be explicit-only, with the
exception of casts to text, which we will allow implicitly for backward
compatibility's sake.

* Within a category, "up" (lossless) conversions are implicit, "down"
(potentially lossy) conversions should be assignment-only.

Comments?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2002-09-16 01:03:58 Re:
Previous Message Shridhar Daithankar 2002-09-15 16:02:37 Re: Physical sites handling large data