Re: Have I found an interval arithmetic bug?

From: Bryn Llewellyn <bryn(at)yugabyte(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Zhihong Yu <zyu(at)yugabyte(dot)com>, Tom Lane PostgreSQL <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Isaac Morland <isaac(dot)morland(at)gmail(dot)com>, John W Higgins <wishdev(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Have I found an interval arithmetic bug?
Date: 2021-07-25 18:56:54
Message-ID: AC1B8865-F73A-450F-92AF-4B6070D7F671@yugabyte.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

> On 23-Jul-2021, bruce(at)momjian(dot)us wrote:
>
> On Fri, Jul 23, 2021 at 10:55:11AM -0700, Bryn Llewellyn wrote:
>> SELECT
>> '1.2345 months 1.2345 days 1.2345 seconds'::interval =
>> '1 month 1 day 1 second'::interval*1.2345;
>>
>> In 13.3, the result is TRUE. (I know that this doesn’t guarantee that the
>> internal representations of the two compared interval values are the same. But
>> it’s a necessary condition for the outcome that I’m referring to and serves to
>> indecate the pont I’m making. A more careful test can be made.
>
> So you are saying fractional unit output should match multiplication
> output? It doesn't now for all units:
>
> SELECT interval '1.3443 years';
> interval
> ---------------
> 1 year 4 mons
>
> SELECT interval '1 years' * 1.3443;
> ?column?
> ---------------------------------
> 1 year 4 mons 3 days 22:45:07.2
>
> It is true this patch is further reducing that matching. Do people
> think I should make them match as part of this patch?

Summary:
--------

It seems to me that the best thing to do is fix the coarse bug about which there is unanimous agreement and to leave everything else (quirks and all) untouched.

Detail:
-------

My previous email (to which Bruce replies here) was muddled. Sorry. The challenge is that are a number of mechanisms at work. Their effects are conflated. And it’s very hard to unscramble them.

The underlying problem is that the internal representation of an interval value is a [mm, dd, ss] tuple. This fact is documented. The representation is key to understanding the definitions of these operations:

— defining an interval value from a text literal that uses real number values for its various fields.

— defining an interval value from make_interval(). (This one is easy because the API requires integral values for all but the seconds argument. It would be interesting to know why this asymmetrical definition was implemented. It seems to imply that somebody though that spilldown was a bad idea and should be prevented before it might happen.)

— creating the text typecast of an extant interval value.

— creating an interval value by adding/subtracting an extant interval value to/from another

— creating an interval value by multiplying or dividing an extant interval value by a (real) number

— creating an interval value by subtracting a pair of moments of the same data type (timestamptz, plain timestamp, or time)

— creating a new moment value by adding or subtracting an extant interval value to an extant moment value of the same data type.

— creating an interval value by applying justify_hours(i), justify_days(i), and justify_interval(i) to an extant interval value, i.

— creating a double precision value by applying extract(epoch from i)
to an extant interval value, i.

— evaluating inequality and equality tests to compare two extant interval values.

Notice that, for example, this test:

select
interval '1.3443 years' as i1,
interval '1 years' * 1.3443 as i2;

conflates three things: converting an interval literal to a [mm, dd, ss] tuple; multiplying a [mm, dd, ss] tuple by a real number; and converting a [mm, dd, ss] tuple to a text representation. Similarly, this test:

select
interval '1.3443 years' <
interval '1 years' * 1.3443;

conflates three things: converting an interval literal to a [mm, dd, ss] tuple; multiplying a [mm, dd, ss] tuple by a real number; and inequality comparison of two [mm, dd, ss] two [mm, dd, ss] tuples.

As far as I’ve been able, the PG documentation doesn’t do a good job of defining the semantics of any of these operations. Some (like the “justify” functions” are sketched reasonably well. Others, like interval multiplication, are entirely undefined.

This makes discussion of simple test like the two I showed immediately above hard. It also makes any discussion of correctness, possible bugs, and proposed implementation changes very difficult.

Further, it also makes it hard to see how tests for application code that uses any of these operations can be designed. The normal approach relies on testing that you get what you expect. But here, you don't know what to expect—unless (as I’ve done) you first assert hypotheses for the undefined operations and test them with programmed simulations. Of course, this is, in general, an unreliable approach. The only way to know what code is intended to do is to read the prose specification that informs the implementation.

I had forgotten one piece of the long history of this thread. Soon after I presented the testcase that folks agree shows a clear bug, I asked about the rules for creating the internal [mm, dd, ss] tuple from a text literal that uses real numbers for the fields. My tests showed two things: (1) an intuitively clear model for the spilldown of nonintegral months to days and then, in turn, of nonintegral days to seconds; and (2) a quirky rule for deriving intermediate months from fractional years and fractional months before then using the more obvious rules to spill to days. (This defies terse expression in prose. I copied my PL/pgSQL implementation below.)

There was initially some discussion about changing implementation o the spill-down from [years, months] in the interval literal to the ultimate [mm, dd, ss] representation. This is what's Bruces is asking about. And it's what I was muddled about.

As I’ve said, my conclusion is that the only safe approach is to create and use only “pure” interval values (where just one of the internal fields is non-zero). For this reason (and having seen what I decided was the impossibly unmemorable rules that my modeled implementation uses) I didn’t look at the rules for the other fields that the interval literal allows (weeks, centuries, millennia, and so on).

--------------------------------------------------------------------------------
mm_trunc constant int not null := trunc(p.mm);
mm_remainder constant double precision not null := p.mm - mm_trunc::double precision;

-- This is a quirk.
mm_out constant int not null := trunc(p.yy*mm_per_yy) + mm_trunc;

dd_real_from_mm constant double precision not null := mm_remainder*dd_per_mm;

dd_int_from_mm constant int not null := trunc(dd_real_from_mm);
dd_remainder_from_mm constant double precision not null := dd_real_from_mm - dd_int_from_mm::double precision;

dd_int_from_user constant int not null := trunc(p.dd);
dd_remainder_from_user constant double precision not null := p.dd - dd_int_from_user::double precision;

dd_out constant int not null := dd_int_from_mm + dd_int_from_user;

d_remainder constant double precision not null := dd_remainder_from_mm + dd_remainder_from_user;

ss_out constant double precision not null := d_remainder*ss_per_dd +
p.hh*ss_per_hh +
p.mi*ss_per_mi +
p.ss;
--------------------------------------------------------------------------------

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David G. Johnston 2021-07-25 19:30:50 Re: pg_restore (fromuser -> touser)
Previous Message Vijaykumar Jain 2021-07-25 18:29:11 Re: pg_restore (fromuser -> touser)

Browse pgsql-hackers by date

  From Date Subject
Next Message Ranier Vilela 2021-07-25 19:57:07 Re: Removing "long int"-related limit on hash table sizes
Previous Message Tom Lane 2021-07-25 18:53:16 Re: Removing "long int"-related limit on hash table sizes