Re: SQL:2011 PERIODS vs Postgres Ranges?

From: Paul A Jungwirth <pj(at)illuminatedcomputing(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>, Vik Fearing <vik(dot)fearing(at)protonmail(dot)com>
Cc: Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: SQL:2011 PERIODS vs Postgres Ranges?
Date: 2020-03-11 23:27:53
Message-ID: CA+renyW6SYCc_vmN2HavjOx74m7KgPU4gzHimMq-F4hPnsNtTA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here is a patch rebasing on master (meant to be applied on top of my
other multirange patch) and newly including UPDATE/DELETE FOR PORTION
OF. FOR PORTION OF works on any table with a temporal primary key. It
restricts the UPDATE/DELETE to the given time frame, and then if the
affected row(s) had any "leftovers" above or below the targeted range,
it INSERTs new rows to preserve the untouched intervals.

I put the implementation into execModifyTable.c (mostly), which I
think is the preferred approach. (There was a great message on
-hackers a year or two ago lamenting how new contributors want to do
all the work in the parse/analysis phase, so I tried to avoid that. I
wish I could find the thread. I want to say Robert Haas wrote it, but
it could have been anyone.)

The executor is new territory for me, so even though this is WIP I'd
love to have someone look at it now. I'm sure I'm doing all kinds of
bad things re locking, transaction isolation, snapshots, etc. (There
are some comments in the .patch for specific worries.)

Also, since I have to do range calculations to handle the leftovers,
this adds knowledge about a specific type (well a specific type of
types) to the executor. I felt like that was mixing abstraction layers
a bit, so perhaps someone will have an opinion/suggestion there. I
could probably build an Expr earlier in the pipeline if I had a way to
feed it the pre-UPDATE values of the row (suggestions anyone?), and
then the executor could just evaluate it without knowing anything
about ranges.

I'm also not yet handling FDW tables, updatable views, or partitioned
tables. Perhaps we don't support FDWs at all here, or leave it up to
the FDW implementation to decide.

I haven't thought much about updatable views yet....

For partitioned tables, I think I can add support without too much
trouble; I'll give it a try soon.

Possibly the temporal PK requirement rules out using this for all
three (FDWs, updatable views, partitioned tables), at least for now.

This is mostly "happy path" so there is probably some error handling to add.

Previously a DELETE never updated indexes, and now I *do* update
indexes if a DELETE has a FOR PORTION OF clause, so that they see the
potential INSERTs. I don't know if that adds any risks around deadlock
etc.

I have a test verifying that we do fire triggers on the implicit
INSERTs (which I think is what the spec requires and is what MariaDB
and IBM DB2 do). Right now my triggers are firing in this order:

BEFORE UPDATE/DELETE
BEFORE INSERT
BEFORE INSERT
AFTER INSERT
AFTER INSERT
AFTER UPDATE/DELETE

In MariaDB they fire in this order:

BEFORE UPDATE/DELETE
BEFORE INSERT
AFTER INSERT
BEFORE INSERT
AFTER INSERT
AFTER UPDATE/DELETE

I haven't yet tested DB2 to see which order it uses. (It does fire the
INSERT triggers though.) I don't know if the spec has an opinion (I've
never found anything explicit, but it talks about "primary" vs
"secondary" operations), and I'm not actually sure how to get
MariaDB's order if we wanted to.

Instead of implementing so much in the executor, I could *almost* have
built FOR PORTION OF based on hidden triggers (sort of like how we
implement FKs). Probably AFTER ROW triggers. Then I'd hardly have to
touch the executor at all, and I wouldn't need to worry as much about
locking/isolation/snapshots. I would just need to add something to the
TriggerData struct giving the FOR PORTION OF bounds. (For an UPDATE I
could pull this out of NEW.valid_at (or whatever you call your
column), but for a DELETE that is NULL.) Basically a trigger needs to
know (1) if the query had a FOR PORTION OF clause (2) what the bounds
were. My triggers would be in C, so I think adding a new field to the
struct is sufficient. Of course we could still expose the values to
user-defined triggers if we wanted with e.g. TG_TARGET_INTERVAL.

With a trigger-based implementation, I'd add the update/delete
triggers whenever someone adds a temporal primary key. That means you
can't use FOR PORTION OF on arbitrary ranges, which is a little sad,
but no worse than the spec. And for now I was only supporting it on PK
columns anyway.

Also with a trigger-based implementation I don't think there is a way
to force our hidden trigger to fire before user-defined triggers,
which might be nice. But if that's not a problem with FKs then it's
probably not a problem here.

A trigger-based implementation also is less flexible in how it
interacts with GENERATED columns, but I think an AFTER trigger would
still do the right thing there.

A trigger-based implementation should give us the same firing order as
MariaDB (IIUC), which might be nice.

So let me know if anyone thinks this would be better implemented as
triggers instead. I'm kind of leaning that way myself to be honest.

One weird issue I noticed concerns dealing with unbounded endpoints.
According to the spec a PERIOD's endpoints must be not-NULL, so you
have to use sentinel values like 3000-JAN-01. Oracle ignores this
though, and since our own ranges already interpret a null bound to
mean infinite, I think we should accept a non-sentinel way of saying
"from now on" or "since the beginning". Right now I accept 'Infinity'
and '-Infinity' in FOR PORTION OF, because those are familiar and
valid values for timestamps/dates/floats. But I translate them to
`NULL` to get proper range behavior. That's because to ranges,
'Infinity' is "right before" an upper null, and '-Infinity' is "right
after" a lower null. For example:

=# select tsrange('2020-01-01', null) - tsrange('2020-01-01', 'Infinity');
?column?
-------------
[infinity,)

That will leave a lot of ugly records in your table if you don't take
care to avoid it.

I guess alternately I could let people say FOR PORTION OF FROM
'2020-01-01' TO NULL. But that is less obvious, and it leaves the
footgun still there. So right now I'm translating +/-Infinity into
NULL to prevent the problem.

Anyway, now that FOR PORTION OF works, I want to add CASCADE support
to temporal FKs. I don't think that will take long.

Then I'd like to bring in Vik Fearing's old patch adding PERIODs, and
start supporting those too. Vik, do you have any objection or advice
about that?

Yours,

Attachment Content-Type Size
v0003-temporal-pk-fk-update-delete.patch application/octet-stream 193.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2020-03-11 23:42:25 Re: Use compiler intrinsics for bit ops in hash
Previous Message Bruce Momjian 2020-03-11 23:13:44 Re: Internal key management system