Re: = t1 - t0 but t0 + i <> t1 when t1 and t2 timestamptz values and i is an interval value

From: Francisco Olarte <folarte(at)peoplecall(dot)com>
To: Bryn Llewellyn <bryn(at)yugabyte(dot)com>
Cc: Tom Lane PostgreSQL <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general list <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: = t1 - t0 but t0 + i <> t1 when t1 and t2 timestamptz values and i is an interval value
Date: 2021-03-27 08:42:51
Message-ID: CA+bJJbyVjs-RLmRQRcA+qUiNRi_dzW_3_abDdEmMVH_-As9v+g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Bryn:

( 1st, sorry if I misquote something, but i use text-only for the list )

On Fri, Mar 26, 2021 at 10:16 PM Bryn Llewellyn <bryn(at)yugabyte(dot)com> wrote:

> I’d deduced the conceptual background that both Tom and Francisco referred to. And I’ve coined the terms “horological interval” and “cultural interval” to capture the distinction. I’d also noticed that it seems that, to first order, the “interval day to second” flavor maps to “horological” and the "interval year to month” flavor maps to “cultural”. However, as my testcase shows, “day” is an oddity because subtracting two timestamptz values treats “day” in the “horological” sense but adding an interval value to a timestamptz treats “day” in the cultural sense. This was the central point of my question. Neither of you referred to this.

I, personally, did not refer to that on purpose, as I did not fully
understand what you were trying to prove, and as your code defaulted
the timezones I could not easily reproduce result. I saw intervals
working as they are dessigned, thought you might be trying to use them
for a purpose they are not dessigned and pointed that.

> I’m going to conclude just that it is what it is and it won’t change.

Intervals have a behaviour. Many people do not like it/consider it
wrong. My advice is normally "do not use them". That is what I do,
except for quick and dirty reports I rarely ever use them .

...

> create table t(k int primary key, i interval day to second not null);
> insert into t(k, i) values(1, '1 day 1 hour'), (2, '25 hours');
> select k, i from t order by k;
>
> The “select” shows that the difference in spelling the two values is preserved in what’s persisted. This is consistent with months, days, and seconds being persisted separately. And it’s consistent with the different semantic effect that the two values can have on addition. I therefore find it confusing that wrong semantics are imposed on the represented value here:
>
> create unique index on t(i);
>
> It causes the “could not create unique index” error. This is the same effect that the “assert” for equality in my testcase shows. The two values can be defined to be equal only if “day” is given a strict horological meaning. But the present discussion shows that it’s taken culturally on interval addition.

Interval are a hairy datatype. I'm not sure they even have a proper
order defined (i.e, is 30 days 6 hours more or less than 1 month ).
They have some normalization conversions, but IMHO they are not good
to use in a btree, which needs to order them.

> This brings me to another strange observation. Do this:
.. Skipping intermediates...
> select (
> ('2021-11-15 12:00:00'::timestamptz - '2021-02-15 12:00:00'::timestamptz)::interval month
> )::text as i;
> It silently produces this result:
> 00:00:00

You are right, the substraction produces a days interval (273 here),
whcih when truncated to months just go to zero.

It surprised me a bit, so I dug into the docs and found this:

postgres=> select('2021-11-15'::timestamptz - '2021-02-15'::timestamptz);
?column?
----------
273 days
(1 row)

postgres=> select age('2021-11-15'::timestamptz, '2021-02-15'::timestamptz);
age
--------
9 mons
(1 row)

I just stored it in the "why you should always look at the manual
before using interval" slots and went on. Note if you alter months to
cross a single DST jump ( in here ) and repeat you would see more
strange results.

postgres=> set timezone TO 'Europe/Madrid';
SET
postgres=> select age('2021-09-15'::timestamptz,
'2021-02-15'::timestamptz), '2021-09-15'::timestamptz -
'2021-02-15'::timestamptz;
age | ?column?
--------+-------------------
7 mons | 211 days 23:00:00
(1 row)

It seems substraction is trying to preserve extract(epoch from
(tza-tzb)) = extract(epoch from tza) - extract(epoch from tzb), but
"beautifying" it a bit by using days. Docs should have it somewhere,
but those chapters are a dense read.

> In plain English, you can’t produce a cultural interval value by subtraction; subtraction can only populate a “day to second” flavor interval. Moreover, the computed “hours” component never exceeds 24 and the computed “days” component is always the horological value.

Probably true. That seems to be the purpose of the age() function and
is one of the reasons I normally avoid intervals.

> I failed to find an explanation of this in the doc—but I dare say that my searching skills are too feeble.

You could find some things like "SQL STD mandates it"·. It does
mandate some really weird things.

....

> I can guess the rules for the outcome when such a hybrid is added to a timestamptz value. It’s possible to design edge case tests where you’d get different outcomes if: (a) the cultural component is added first and only then the horological component is added; or (b) the components are added in the other order. It seems to me that the outcome is governed by rule (a). Am I right?

I'm completely lost with the horological/cultural things. My
understanding of the postgres intervals are:

- 3 (internal) fields, months, days, seconds. Note they may have
different signs!
- months are sometimes printed as years, months because years have 12 months
- seconds printed as hhmmss because minutes always have 60 seconds and
hours 60 minutes ( disregarding leap seconds )
- When adding, IIRC, first add the months, then add the days, then add
the seconds, rollig over the date as needed in each step.

The very few cases where I've used it, mainly for calendaring, or for
partitioning ( which is calendaring ), has worked well. Things like
"meeting scheduled '3 months' from previous, confirmation mails sent
'3 months, -14 days' from previous'. I've never tried to make some
thing as complex as what you seem to be trying, I'll probably just
roll my own datatype in that case justo to be sure the semantics are
what I want.

Regards.
Francisco Olarte.

>
> B.t.w., I think that the specific complexities of the proleptic Gregorian calendar are cleanly separable from the basic idea that (considering only the requirements statement space) there is a real distinction to be drawn between “horological” and “cultural”—no matter what calendar rules might be used.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message FOUTE K. Jaurès 2021-03-27 10:51:29 ERROR: could not start WAL streaming: ERROR: replication slot "XXX" does not exist
Previous Message Adrian Klaver 2021-03-26 22:27:21 Re: Hello - About how to install PgAdmin4 on Debian 10