Re: SQL:2011 application time

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Paul A Jungwirth <pj(at)illuminatedcomputing(dot)com>
Cc: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: SQL:2011 application time
Date: 2021-09-19 00:46:16
Message-ID: CADkLM=ekdXZceMyggb6Lqj2447VfiS-Bw22Gdgd5tBotfKmi6A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

In IBM DB2 you can only have one because application-time periods must
> be named "business_time" (not joking).
>

I saw that as well, and it made me think that someone at IBM is a fan of
Flight Of The Conchords.

> Personally I feel like it's a weird limitation and I wouldn't mind
> supporting more, but my current implementation only allows for one,
> and I'd have to rethink some things to do it differently.
>

I'm satisfied that it's not something we need to do in the first MVP.

>
> Yes. Even though the name "SYSTEM_TIME" is technically enough, I'd
> still include a pertype column to make distinguishing system vs
> application periods easier and more obvious.
>

SYSTEM_TIME seems to allow for DATE values in the start_time and end_time
fields, though I cannot imagine how that would ever be practical, unless it
were somehow desirable to reject subsequent updates within a 24 hour
timeframe. I have seen instances where home-rolled application periods used
date values, which had similar problems where certain intermediate updates
would simply have to be discarded in favor of the one that was still
standing at midnight.

>
> > 2. The system versioning effort has chosen 'infinity' as their end-time
> value, whereas you have chosen NULL as that makes sense for an unbounded
> range. Other databases seem to leverage '9999-12-31 23:59:59' (SQLServer,
> IIRC) whereas some others seem to used '2999-12-31 23:59:59' but those
> might have been home-rolled temporal implementations. To further add to the
> confusion, the syntax seems to specify the keyword of MAXVALUE, which
> further muddies things. The system versioning people went with 'infinity'
> seemingly because it prescribe and end to the world like SQLServer did, but
> also because it allowed for a primary key based on (id, endtime) and that's
> just not possible with NULL endtime values.
>
> I think it's a little weird that our system-time patch mutates your
> primary key. None of the other RDMBSes do that. I don't think it's
> incompatible (as long as the system time patch knows how to preserve
> the extra period/range data in an application-time temporal key), but
> it feels messy to me.
>

Per outline below, I'm proposing an alternate SYSTEM_TIME implementation
that would leave the PK as-is.

> I would prefer if system-time and application-time used the same value
> to mean "unbounded". Using null means we can support any type (not
> just types with +-Infinity). And it pairs nicely with range types. If
> the only reason for system-time to use Infinity is the primary key, I
> think it would be better not to mutate the primary key (and store the
> historical records in a separate table as other RDMSes do).
>

The two "big wins" of infinity seemed (to me) to be:

1. the ability to add "AND end_time = 'infinity'" as a cheap way to get
current rows
2. clauses like "WHERE CURRENT_DATE - 3 BETWEEN start_time AND end_time"
would work. Granted, there's very specific new syntax to do that properly,
but you know somebody's gonna see the columns and try to do it that way.

>
> Btw Oracle also uses NULL to mean "unbounded".
>

Huh, I missed that one. That is good in that it gives some precedence to
how you've approached it.

>
> We presently forbid PKs from including expressions, but my patch lifts
> that exception so it can index a rangetype expression built from the
> period start & end columns. So even if we must include the system-time
> end column in a PK, perhaps it can use a COALESCE expression to store
> Infinity even while using NULL to signify "currently true" from a user
> perspective.
>

Either way seems viable, but I understand why you want to leverage ranges
in this way.

>
> > 3. I noticed some inconsistency in the results from various "SELECT *
> FROM portion_of_test" examples. In some, the "valid_at" range is shown but
> not columns that make it up, and in some others, the "valid_from" and
> "valid_to" columns are shown, with no mention of the period. From what I've
> seen, the period column should be invisible unless invoked, like ctid or
> xmin.
>
> In most cases the tests test the same functionality with both PERIODs
> and rangetype columns. For FKs they test all four combinations of
> PERIOD-referencing-PERIOD, PERIOD-referencing-range,
> range-referencing-PERIOD, and range-referencing-range. If valid_at is
> a genuine column, it is included in SELECT *, but not if it is a
> PERIOD.
>

Ok, I'll have to look back over the test coverage to make sure that I
understand the behavior now.

>
> > 4. The syntax '2018-03-04' AT TIME ZONE INTERVAL '2' HOUR TO MINUTE
> simply confounded me.
>
> Me too! I have no idea what that is supposed to mean. But that
> behavior predates my patch. I only had to deal with it because it
> creates a shift-reduce conflict with `FOR PORTION OF valid_at FROM x
> TO y`, where x & y are expressions. I asked about this syntax at my
> PgCon 2020 talk, but I haven't ever received an answer. Perhaps
> someone else knows what this kind of INTERVAL means (as a modifier of
> a time value).
>

I think I'll open this as a separate thread, because it would simplify
matters if we can reject this nonsense syntax.

This was the alternative method of system versioning I proposed recently in
the system versioning thread

1. The regular table remains unchanged, but a pg_class attribute named
"relissystemversioned" would be set to true

2. I'm unsure if the standard allows dropping a column from a table while
it is system versioned, and the purpose behind system versioning makes me
believe the answer is a strong "no" and requiring DROP COLUMN to fail
on relissystemversioned = 't' seems pretty straightforward.
3. The history table would be given a default name of $FOO_history (space
permitting), but could be overridden with the history_table option.
4. The history table would have relkind = 'h'
5. The history table will only have rows that are not current, so it is
created empty.
6. As such, the table is effectively append-only, in a way that vacuum can
actually leverage, and likewise the fill factor of such a table should
never be less than 100.
7. The history table could only be updated only via system defined triggers
(insert,update,delete, alter to add columns), or row migration similar to
that found in partitioning. It seems like this would work as the two tables
working as partitions of the same table, but presently we can't have
multi-parent partitions.
8. The history table would be indexed the same as the base table, except
that all unique indexes would be made non-unique, and an index of pk +
start_time + end_time would be added
9. The primary key of the base table would remain the existing pk vals, and
would basically function normally, with triggers to carry forth changes to
the history table. The net effect of this is that the end_time value of all
rows in the main table would always be the chosen "current" value
(infinity, null, 9999-12-31, etc) and as such might not actually _need_ to
be stored.
10. Queries that omit the FOR SYSTEM_TIME clause, as well as ones that use
FOR SYSTEM_TIME AS OF CURRENT_TIMESTAMP, would simply use the base table
directly with no quals to add.
11. Queries that use FOR SYSTEM_TIME and not FOR SYSTEM_TIME AS
OF CURRENT_TIMESTAMP, then the query would do a union of the base table and
the history table with quals applied to both.
12. It's a fair question whether the history table would be something that
could be queried directly. I'm inclined to say no, because that allows for
things like SELECT FOR UPDATE, which of course we'd have to reject.
13. If a history table is directly referenceable, then SELECT permission
can be granted or revoked as normal, but all insert/update/delete/truncate
options would raise an error.
14. DROP SYSTEM VERSIONING from a table would be quite straightforward -
the history table would be dropped along with the triggers that reference
it, setting relissystemversioned = 'f' on the base table.

The benefits to your effort here would be:

1. No change to the primary key except for the ones dictated by application
period
2. The INFORMATION_SCHEMA view need merely take into account The
pg_class.relkind = 'h' entries
3. system versioning is no longer mutating (trigger on X updates X), which
eliminates the possibility that application period triggers get into a loop
4. DROP SYSTEM VERSIONING would be entirely transparent to application
versioning.

Thoughts?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Corey Huinker 2021-09-19 01:28:49 Undocumented AT TIME ZONE INTERVAL syntax
Previous Message Corey Huinker 2021-09-19 00:15:52 Re: WIP: System Versioned Temporal Table