Re: WIP: System Versioned Temporal Table

From: Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>
To: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, vignesh C <vignesh21(at)gmail(dot)com>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, Surafel Temesgen <surafel3000(at)gmail(dot)com>, Vik Fearing <vik(at)postgresfriends(dot)org>, Rémi Lapeyre <remi(dot)lapeyre(at)lenstra(dot)fr>, Ryan Lambert <ryan(at)rustprooflabs(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Eli Marmor <eli(at)netmask(dot)it>, David Steele <david(at)pgmasters(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Georgios <gkokolatos(at)protonmail(dot)com>, Li Japin <japinli(at)hotmail(dot)com>
Subject: Re: WIP: System Versioned Temporal Table
Date: 2021-09-19 18:32:20
Message-ID: CANbhV-ERW0G1B=ALwa+Uh0XX975NnHUwFzDgtukeszowm3KRmA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, 19 Sept 2021 at 01:16, Corey Huinker <corey(dot)huinker(at)gmail(dot)com> wrote:
>>
>> 1. Much of what I have read about temporal tables seemed to imply or almost assume that system temporal tables would be implemented as two actual separate tables. Indeed, SQLServer appears to do it that way [1] with syntax like
>>
>> WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.WebsiteUserInfoHistory));
>>
>>
>> Q 1.1. Was that implementation considered and if so, what made this implementation more appealing?
>>
>
> I've been digging some more on this point, and I've reached the conclusion that a separate history table is the better implementation. It would make the act of removing system versioning into little more than a DROP TABLE, plus adjusting the base table to reflect that it is no longer system versioned.

Thanks for giving this a lot of thought. When you asked the question
the first time you hadn't discussed how that might work, but now we
have something to discuss.

> 10. Queries that omit the FOR SYSTEM_TIME clause, as well as ones that use FOR SYSTEM_TIME AS OF CURRENT_TIMESTAMP, would simply use the base table directly with no quals to add.
> 11. Queries that use FOR SYSTEM_TIME and not FOR SYSTEM_TIME AS OF CURRENT_TIMESTAMP, then the query would do a union of the base table and the history table with quals applied to both.

> 14. DROP SYSTEM VERSIONING from a table would be quite straightforward - the history table would be dropped along with the triggers that reference it, setting relissystemversioned = 'f' on the base table.
>
> I think this would have some key advantages:
>
> 1. MVCC bloat is no worse than it was before.

The number of row versions stored in the database is the same for
both, just it would be split across two tables in this form.

> 2. No changes whatsoever to referential integrity.

The changes were fairly minor, but I see your thinking about indexes
as a simplification.

> 3. DROP SYSTEM VERSIONING becomes an O(1) operation.

It isn't top of mind to make this work well. The whole purpose of the
history is to keep it, not to be able to drop it quickly.

> Thoughts?

There are 3 implementation routes that I see, so let me explain so
that others can join the discussion.

1. Putting all data in one table. This makes DROP SYSTEM VERSIONING
effectively impossible. It requires access to the table to be
rewritten to add in historical quals for non-historical access and it
requires some push-ups around indexes. (The current patch adds the
historic quals by kludging the parser, which is wrong place, since it
doesn't work for joins etc.. However, given that issue, the rest seems
to follow on naturally).

2. Putting data in a side table. This makes DROP SYSTEM VERSIONING
fairly trivial, but it complicates many DDL commands (please make a
list?) and requires the optimizer to know about this and cater to it,
possibly complicating plans. Neither issue is insurmountable, but it
becomes more intrusive.

The current patch could go in either of the first 2 directions with
further work.

3. Let the Table Access Method handle it. I call this out separately
since it avoids making changes to the rest of Postgres, which might be
a good thing, with the right TAM implementation.

My preferred approach would be to do this "for free" in the table
access method, but we're a long way from this in terms of actual
implementation. When Corey suggested earlier that we just put the
syntax in there, this was the direction I was thinking.

After waiting a day since I wrote the above, I think we should go with
(2) as Corey suggests, at least for now, and we can always add (3)
later.

--
Simon Riggs http://www.EnterpriseDB.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hannu Krosing 2021-09-19 19:12:37 Re: WIP: System Versioned Temporal Table
Previous Message Justin Pryzby 2021-09-19 16:32:01 Re: Release 14 Schedule