SQL:2011 application time

From: Paul A Jungwirth <pj(at)illuminatedcomputing(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: SQL:2011 application time
Date: 2021-06-30 17:39:00
Message-ID: CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

Here is a set of patches to add SQL:2011 application-time support (aka
valid-time).
Previous discussion was on
https://www.postgresql.org/message-id/20200930073908.GQ1996@paquier.xyz
but I thought I should update the email subject.

There are four patches here:

- Add PERIODs.
- Add temporal PRIMARY KEY and UNIQUE constraints.
- Add UPDATE/DELETE FOR PORTION OF.
- Add temporal FOREIGN KEYs.

The PERIOD patch is mostly Vik Fearing's work (submitted here a few
years ago), so he should get credit for that!

All patches have tests & documentation. I do have a few more tests I
plan to write, and there are some questions for reviewers embedded in
patches (mostly about when to lock and/or copy data structures). I've
tried to format these as C++ comments to indicate they should be
removed before committing.

Throughout I've made sure that wherever SQL:2011 accepts a PERIOD, we
also accept a range column. So in all these examples valid_at could be
either one:

PRIMARY KEY (id, valid_at WITHOUT OVERLAPS)
FOREIGN KEY (id, PERIOD valid_at)
REFERENCES too (id, PERIOD valid_at)
FOR PORTION OF valid_at FROM t1 TO t2

Range types are superior to PERIODs in many ways, so I think we should
support both. For example you can SELECT them, WHERE them, GROUP BY
them, pass them to functions, return them from functions, do
arithmetic on them, index them, etc.

In fact whether you use a PERIOD or a range, the implementation uses
ranges a lot, since they are such a good fit. A temporal PK is really
an exclusion constraint, etc. When you define a PERIOD, we find a
matching range type and store its oid on the period record. If there
are more than one range type we raise an error, but you can give a
rangetype option to remove the ambiguity. This means we support
PERIODs of any type (basically), not just dates & timestamps.

According to SQL:2011 we should automatically set any columns used by
a PERIOD to NOT NULL. I've ignored that requirement, since permitting
nullable columns is strictly greater functionality: you can always
make the columns NOT NULL if you like. Interpreting NULLs as unbounded
fits better with our range types, and it means you don't have to use
sentinels. (Timestamp has +-Infinity, but many types don't.) Oracle
also accepts null PERIOD columns and treats them the same way. I don't
think it would break anything though to force PERIOD columns to NOT
NULL. If you hate sentinels you can just use range columns. But still
I see no reason to force this on our users.

In the FOR PORTION OF bounds I accept MINVALUE and MAXVALUE as special
tokens. I chose the names to be consistent with partition syntax. This
isn't part of the standard but seems nice.

Here are a few other things to discuss:

- My patch only adds application time. There is a separate patch to
add system time: https://commitfest.postgresql.org/33/2316/ I don't
foresee any serious conflicts between our work, and in general I think
each patch implements its functionality at an appropriate (but
different) level of abstraction. But I haven't looked at that patch
recently. I'll try to give some comments during this commitfest. The
one place they probably overlap is with defining PERIODs. Since
system-time periods *must* be named SYSTEM_TIME, even that overlap
should be slight, but it still might be worth accepting the PERIOD
patch here before adopting either. Even SYSTEM_TIME ought to be
recorded in information_schema.periods IIRC.

- The biggest thing remaining to do is to add support for partitioned
tables. I would love some help with that if anyone is interested.

- Since temporal PKs are implemented with exclusion constraints they
use GiST indexes, so you can't really use them without the btree_gist
extension (unless *all* your key parts are ranges---which is how we
test exclusion constraints). Personally I'm okay with this, since even
exclusion constraints are pretty useless without that extension. But
it seems like something to talk about.

- At PgCon 2020 Vik suggested a different way of querying for FK
checks, which he used in his own temporal tables extension. It is more
complicated but he thinks it may be faster. I plan to try both and run
some benchmarks. I'm not sure whether his approach will work with
CASCADE/SET NULL/SET DEFAULT---but I haven't looked at it in a while.

- It is hard to avoid a shift/reduce conflict in FOR PORTION OF
<period_or_range> FROM <expr> TO <expr> because expressions may
contain date INTERVALs that also may contain TO. So this is an error:

FOR PORTION OF valid_at
FROM '2018-03-01' AT TIME ZONE INTERVAL '2' HOUR
TO '2019-01-01'

but this works:

FOR PORTION OF valid_at
FROM ('2018-03-01' AT TIME ZONE INTERVAL '2' HOUR)
TO '2019-01-01'

I'm personally satisfied with that, but if anyone thinks it can be
improved please let me know. It would be nice if the parser were smart
enough to see that without a second TO, it must belong to FOR PORTION
OF, not the interval. But *I'm* not smart enough to teach it that. :-)
If only it could have a greater lookahead. . . .

- Normally we return the number of rows affected by an UPDATE/DELETE.
What do you think we should do when a FOR PORTION OF causes extra rows
to be inserted? I'm not doing anything special here today. After all
foreign keys don't do anything extra when they CASCADE/SET (to my
knowledge). Also I think adding info about the inserted rows might be
annoying, since I'd have to communicate it from within the trigger
function. I'm really hoping no one asks for this.

- Since PERIODs are a weird neither-fish-nor-foul thing (parsed a lot
like a column, but also behaving like a constraint), they add a lot of
tedious if-statements when they are used by an index or constraint. In
many places I've used a zero attnum to signal that a component is
really a PERIOD. (Range columns are easy since they really are a
column.) I feel this approach is pretty ugly, so I will probably
experiment a bit with a different way. If anyone else wants to take
this on though, I'm grateful for the help.

- It would be really cool if ON CONFLICT DO UPDATE had a temporal
variant so it would INSERT the missing durations and UPDATE the
existing ones. That's what Tom Johnston said the standard should have
required in *Bitemporal Data*, and it does make things a lot easier on
the client side. But that is something to do in a later patch. . . .

Yours,
Paul

Attachment Content-Type Size
v4-0001-Add-PERIODs.patch application/octet-stream 117.9 KB
v4-0003-Add-UPDATE-DELETE-FOR-PORTION-OF.patch application/octet-stream 126.4 KB
v4-0002-Add-temporal-PRIMARY-KEY-and-UNIQUE-constraints.patch application/octet-stream 83.4 KB
v4-0004-Add-temporal-FOREIGN-KEYs.patch application/octet-stream 306.8 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Antonin Houska 2021-06-30 17:41:16 Re: POC: Cleaning up orphaned files using undo logs
Previous Message Peter Eisentraut 2021-06-30 17:24:26 Re: Preventing abort() and exit() calls in libpq