Re: Have I found an interval arithmetic bug?

From: Bryn Llewellyn <bryn(at)yugabyte(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Zhihong Yu <zyu(at)yugabyte(dot)com>, Isaac Morland <isaac(dot)morland(at)gmail(dot)com>, John W Higgins <wishdev(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Tom Lane PostgreSQL <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Have I found an interval arithmetic bug?
Date: 2021-04-05 20:06:36
Message-ID: 579B7353-CE03-4F2A-AA69-895BC66F9E24@yugabyte.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

> On 05-Apr-2021, at 11:37, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>
> On Mon, Apr 5, 2021 at 01:15:22PM -0500, Justin Pryzby wrote:
>> On Mon, Apr 05, 2021 at 02:01:58PM -0400, Bruce Momjian wrote:
>>> On Mon, Apr 5, 2021 at 11:33:10AM -0500, Justin Pryzby wrote:
>>>>> https://www.google.com/url?q=https://www.postgresql.org/docs/current/datatype-datetime.html%23DATATYPE-INTERVAL-INPUT&source=gmail-imap&ust=1618252677000000&usg=AOvVaw34LnV9DlK4pcYY5NJGQe-m
>>>>> « …field values can have fractional parts; for example '1.5 week' or '01:02:03.45'. Such input is converted to the appropriate number of months, days, and seconds for storage. When this would result in a fractional number of months or days, the fraction is added to the lower-order fields using the conversion factors 1 month = 30 days and 1 day = 24 hours. For example, '1.5 month' becomes 1 month and 15 days. Only seconds will ever be shown as fractional on output. »
>>>
>>> I see that. What is not clear here is how far we flow down. I was
>>> looking at adding documentation or regression tests for that, but was
>>> unsure. I adjusted the docs slightly in the attached patch.
>>
>> I should have adjusted the quote to include context:
>>
>> | In the verbose input format, and in SOME FIELDS of the more compact input formats, field values can have fractional parts[...]
>>
>> I don't know what "some fields" means - more clarity here would help indicate
>> the intended behavior.
>
> I assume it is comparing the verbose format to the ISO 8601 time
> intervals format, which I have not looked at. Interesting I see this as
> a C comment at the top of DecodeISO8601Interval();
>
> * A couple exceptions from the spec:
> * - a week field ('W') may coexist with other units
> --> * - allows decimals in fields other than the least significant unit.
>
> I don't actually see anything in our code that doesn't support factional
> values, so maybe the docs are wrong and need to be fixed.
>
> Actually, according to our regression tests, this fails:
>
> SELECT '5.5 seconds 3 milliseconds'::interval;
> ERROR: invalid input syntax for type interval: "5.5 seconds 3 milliseconds"
>
> but that is the verbose format, I think.
>
>>> The interaction of months/days/seconds is so imprecise that passing it
>>> futher down doesn't make much sense, and suggests a precision that
>>> doesn't exist, but if people prefer that we can do it.
>>
>> I agree on its face that "months" is imprecise (30, 31, 27, 28 days),
>> especially fractional months, and same for "years" (leap years), and hours per
>> day (DST), but even minutes ("leap seconds"). But the documentation seems to
>> be clear about the behavior:
>>
>> | .. using the conversion factors 1 month = 30 days and 1 day = 24 hours
>>
>> I think the most obvious/consistent change is for years and greater to "cascade
>> down" to seconds, and not just months.
>
> Wow, well, that is _an_ option. Would people like that? It is certainly
> easier to explain.

It seems to me that this whole business is an irrevocable mess. The original design could have brought three overload-distinguishable types, "interval month", "interval day", and "interval second"—each represented internally as a scalar. There could have been built-ins to convert between them using conventionally specified rules. Then interval arithmetic would have been clear. For example, an attempt to assign the difference between two timestamps to anything but "interval second" would cause an error (as it does in Oracle database, even though there there are only two interval kinds). But we can only deal with what we have and accept the fact that the doc will inevitably be tortuous.

Givea this, I agree that fractional years should simply convert to fractional months (to be then added to verbetim-given fractional months) just before representing the months as the trunc() of the value and cascading the remainder down to days. Units like century would fall out naturally in the same way.

ABOUT LEAP SECONDS

Look at this (from Feb 2005):

«
PostgreSQL does not support leap seconds
https://www.postgresql.org/message-id/1162319515.20050202141132@mail.ru
»

I don't know if the title reports a state of affairs in the hope that this be changed to bring such support—or whether it simply states what obtains and always will. Anyway, a simple test (below) shows that PG Version 13.2 doesn't honor leap seconds.

DETAIL

First, it helps me to demonstrate, using leap years, that this is a base phenomenon of the proleptic Gregorian calendar that PG uses—and has nothing to do with time zones. (If it did, the then leap year notion could be time zone dependent. Do this

select
'2020-02-29'::date as "date",
'2020-02-29 23:59:59.99999'::timestamp as "plain timestamp";

This is the result:

date | plain timestamp
------------+---------------------------
2020-02-29 | 2020-02-29 23:59:59.99999

Changing the year to 2021 brings the 22008 error "date/time field value out of range". (Of course, you have to split the test into two pieces to be sure that you get the same error with both data types.)

This suggests a test that uses '23:59:60.000000' for the time. However, try this first:

select
'23:59:60.000000'::time as "time",
'2021-04-05 23:59:60.000000'::timestamp as "plain timestamp";

time | plain timestamp
----------+---------------------
24:00:00 | 2021-04-06 00:00:00

This is annoying. It reflects what seems to me to be an unfortunate design choice. Anyway, this behavior will never change. But it means that a precise discussion needs more words than had one minute been taken as a closed-open interval—[0,60) seconds—(with 59.99999 legal and 60.000000 illegal). It's too boring to type all those words here. Just do this:

select '2021-04-05 23:59:60.5'::timestamp as "plain timestamp";

This is the result:

plain timestamp
-----------------------
2021-04-06 00:00:00.5

Of course, there was no leap second (on the planet—never, mind databases) at this moment. The most recent leap second was 2016-12-31 at 23:59:60 (UTC) meaning that '60.000000' through '60.999999' were all meaningful times on 31-Dec that year. So try this:

select '2016-12-31 23:59:60.5'::timestamp as "should be leap second";

This is the result:

should be leap second
-----------------------
2017-01-01 00:00:00.5

This tells me that the subject line of the email from 2005 remains correct: PostgreSQL does not support leap seconds. Given this, we can safely say that one minute is exactly 60 seconds (and that one hour is exactly 60 minutes) and never mention leap seconds ever again. I assume that it's this that must have informed the decision to represent an interval value as the three fields months, days, and seconds.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Bruce Momjian 2021-04-05 20:35:24 Re: Have I found an interval arithmetic bug?
Previous Message Bruce Momjian 2021-04-05 18:41:31 Re: Is replacing transactions with CTE a good idea?

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2021-04-05 20:35:24 Re: Have I found an interval arithmetic bug?
Previous Message Mats Kindahl 2021-04-05 19:57:12 Table AM and DROP TABLE [ Was: Table AM and DDLs]