Re: Proposal for fixing numeric type-resolution issues

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal for fixing numeric type-resolution issues
Date: 2000-06-13 07:41:56
Message-ID: 200006130741.DAA23502@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Again, anything to add to the TODO here?

> We've got a collection of problems that are related to the parser's
> inability to make good type-resolution choices for numeric constants.
> In some cases you get a hard error; for example "NumericVar + 4.4"
> yields
> ERROR: Unable to identify an operator '+' for types 'numeric' and 'float8'
> You will have to retype this query using an explicit cast
> because "4.4" is initially typed as float8 and the system can't figure
> out whether to use numeric or float8 addition. A more subtle problem
> is that a query like "... WHERE Int2Var < 42" is unable to make use of
> an index on the int2 column: 42 is resolved as int4, so the operator
> is int24lt, which works but is not in the opclass of an int2 index.
>
> Here is a proposal for fixing these problems. I think we could get this
> done for 7.1 if people like it.
>
> The basic problem is that there's not enough smarts in the type resolver
> about the interrelationships of the numeric datatypes. All it has is
> a concept of a most-preferred type within the category of numeric types.
> (We are abusing the most-preferred-type mechanism, BTW, because both
> FLOAT8 and NUMERIC claim to be the most-preferred type in the numeric
> category! This is in fact why the resolver can't make a choice for
> "numeric+float8".) We need more intelligence than that.
>
> I propose that we set up a strictly-ordered hierarchy of numeric
> datatypes, running from least preferred to most preferred:
> int2, int4, int8, numeric, float4, float8.
> Rather than simply considering coercions to the most-preferred type,
> the type resolver should use the following rules:
>
> 1. No value will be down-converted (eg int4 to int2) except by an
> explicit conversion.
>
> 2. If there is not an exact matching operator, numeric values will be
> up-converted to the highest numeric datatype present among the operator
> or function's arguments. For example, given "int2 + int8" we'd up-
> convert the int2 to int8 and apply int8 addition.
>
> The final piece of the puzzle is that the type initially assigned to
> an undecorated numeric constant should be NUMERIC if it contains a
> decimal point or exponent, and otherwise the smallest of int2, int4,
> int8, NUMERIC that will represent it. This is a considerable change
> from the current lexer behavior, where you get either int4 or float8.
>
> For example, given "NumericVar + 4.4", the constant 4.4 will initially
> be assigned type NUMERIC, we will resolve the operator as numeric plus,
> and everything's fine. Given "Float8Var + 4.4", the constant is still
> initially numeric, but will be up-converted to float8 so that float8
> addition can be used. The end result is the same as in traditional
> Postgres: you get float8 addition. Given "Int2Var < 42", the constant
> is initially typed as int2, since it fits, and we end up selecting
> int2lt, thereby allowing use of an int2 index. (On the other hand,
> given "Int2Var < 100000", we'd end up using int4lt, which is correct
> to avoid overflow.)
>
> A couple of crucial subtleties here:
>
> 1. We are assuming that the parser or optimizer will constant-fold
> any conversion functions that are introduced. Thus, in the
> "Float8Var + 4.4" case, the 4.4 is represented as a float8 4.4 by the
> time execution begins, so there's no performance loss.
>
> 2. We cannot lose precision by initially representing a constant as
> numeric and later converting it to float. Nor can we exceed NUMERIC's
> range (the default 1000-digit limit is more than the range of IEEE
> float8 data). It would not work as well to start out by representing
> a constant as float and then converting it to numeric.
>
> Presently, the pg_proc and pg_operator tables contain a pretty fair
> collection of cross-datatype numeric operators, such as int24lt,
> float48pl, etc. We could perhaps leave these in, but I believe that
> it is better to remove them. For example, if int42lt is left in place,
> then it would capture cases like "Int4Var < 42", whereas we need that
> to be translated to int4lt so that an int4 index can be used. Removing
> these operators will eliminate some code bloat and system-catalog bloat
> to boot.
>
> As far as I can tell, this proposal is almost compatible with the rules
> given in SQL92: in particular, SQL92 specifies that an operator having
> both "approximate numeric" (float) and "exact numeric" (int or numeric)
> inputs should deliver an approximate-numeric result. I propose
> deviating from SQL92 in a single respect: SQL92 specifies that a
> constant containing an exponent (eg 1.2E34) is approximate numeric,
> which implies that the result of an operator using it is approximate
> even if the other operand is exact. I believe it's better to treat
> such a constant as exact (ie, type NUMERIC) and only convert it to
> float if the other operand is float. Without doing that, an assignment
> like
> UPDATE tab SET NumericVar = 1.234567890123456789012345E34;
> will not work as desired because the constant will be prematurely
> coerced to float, causing precision loss.
>
> Comments?
>
> regards, tom lane
>

--
Bruce Momjian | http://www.op.net/~candle
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2000-06-13 07:47:38 Re: RE: PostgreSQL and Unicode
Previous Message Bruce Momjian 2000-06-13 07:38:54 Re: COPY BINARY to STDOUT