Re: truncating timestamps on arbitrary intervals

From: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: truncating timestamps on arbitrary intervals
Date: 2020-11-23 17:44:57
Message-ID: CAFBsxsGUDP2jky8SLk_zyWXFzj_A0G6BLR7R2kVjOLc5vNROQw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 12, 2020 at 9:56 AM Peter Eisentraut <
peter(dot)eisentraut(at)enterprisedb(dot)com> wrote:
>
> On 2020-06-30 06:34, John Naylor wrote:
> > In v9, I've simplified the patch somewhat to make it easier for future
> > work to build on.
> >
> > - When truncating on month-or-greater intervals, require the origin to
> > align on month. This removes the need to handle weird corner cases
> > that have no straightforward behavior.
> > - Remove hackish and possibly broken code to allow origin to be after
> > the input timestamp. The default origin is Jan 1, 1 AD, so only AD
> > dates will behave correctly by default. This is not enforced for now,
> > since it may be desirable to find a way to get this to work in a nicer
> > way.
> > - Rebase docs over PG13 formatting changes.
>
> This looks pretty solid now. Are there any more corner cases and other
> areas with unclear behavior that you are aware of?

Hi Peter,

Thanks for taking a look!

I believe there are no known corner cases aside from not throwing an error
if origin > input, but I'll revisit that when we are more firm on what
features we want support.

> A couple of thoughts:
>
> - After reading the discussion a few times, I'm not so sure anymore
> whether making this a cousin of date_trunc is the right way to go. As
> you mentioned, there are some behaviors specific to date_trunc that
> don't really make sense in date_trunc_interval, and maybe we'll have
> more of those.

As far as the behaviors, I'm not sure exactly what you what you were
thinking of, but here are two issues off the top of my head:

- If the new functions are considered variants of date_trunc(), there is
the expectation that the options work the same way, in particular the
timezone parameter. You asked specifically about that below, so I'll
address that separately.
- In the "week" case, the boundary position depends on the origin, since a
week-long interval is just 7 days.

> Also, date_trunc_interval isn't exactly a handy name.
> Maybe something to think about. It's obviously fairly straightforward
> to change it.

Effectively, it puts timestamps into bins, so maybe date_bin() or something
like that?

> - There were various issues with the stride interval having months and
> years. I'm not sure we even need that. It could be omitted unless you
> are confident that your implementation is now sufficient.

Months and years were a bit tricky, so I'd be happy to leave that out if
there is not much demand for it. date_trunc() already has quarters,
decades, centuries, and millenia.

> - Also, negative intervals could be prohibited, but I suppose that
> matters less.

Good for the sake of completeness. I think they happen to work in v9 by
accident, but it would be better not to expose that.

> - I'm curious about the origin being set to 0001-01-01. This seems to
> work correctly in that it sets the origin to a Monday, which is what we
> wanted, but according to Google that day was a Saturday. Something to
> do with Julian vs. Gregorian calendar?

Right, working backwards from our calendar today, it's Monday, but at the
time it would theoretically be Saturday, barring leap year miscalculations.

> Maybe we should choose a date
> that is a bit more recent and easier to reason with.

2001-01-01 would also be a Monday aligned with centuries and millenia, so
that would be my next suggestion. If we don't care to match with
date_trunc() on those larger units, we could also use 1900-01-01, so the
vast majority of dates in databases are after the origin.

> - Then again, I'm thinking that maybe we should make the origin
> mandatory. Otherwise, the default answers when having strides larger
> than a day are entirely arbitrary, e.g.,
>
> => select date_trunc_interval('10 year', '0196-05-20 BC'::timestamp);
> 0190-01-01 00:00:00 BC
>
> => select date_trunc_interval('10 year', '0196-05-20 AD'::timestamp);
> 0191-01-01 00:00:00

Right. In the first case, the default origin is also after the input, and
crosses the AD/BC boundary. Tricky to get right.

> Perhaps the origin could be defaulted if the interval is less than a day
> or something like that.

If we didn't allow months and years to be units, it seems the default would
always make sense?

> - I'm wondering whether we need the date_trunc_interval(interval,
> timestamptz, timezone) variant. Isn't that the same as
> date_trunc_interval(foo AT ZONE xyz, value)?

I based this on 600b04d6b5ef6 for date_trunc(), whose message states:

date_trunc(field, timestamptz, zone_name)

is the same as

date_trunc(field, timestamptz at time zone zone_name) at time zone zone_name

so without the shorthand, you need to specify the timezone twice, once for
the calculation, and once for the output.

--
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Corbit, Dann 2020-11-23 17:49:01 Re: Connection using ODBC and SSL
Previous Message Heikki Linnakangas 2020-11-23 17:36:06 Re: Online checksums patch - once again