Re: WIP: System Versioned Temporal Table

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: vik(dot)fearing(at)2ndquadrant(dot)com
Cc: surafel3000(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: System Versioned Temporal Table
Date: 2020-01-07 10:32:29
Message-ID: 20200107.193229.846812995558264917.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello.

Isn't this patch somehow broken?

At Mon, 28 Oct 2019 16:36:09 +0100, Vik Fearing <vik(dot)fearing(at)2ndquadrant(dot)com> wrote in
> On 28/10/2019 13:48, Surafel Temesgen wrote:
> >
> >
> > On Fri, Oct 25, 2019 at 10:45 PM Vik Fearing
> > <vik(dot)fearing(at)2ndquadrant(dot)com <mailto:vik(dot)fearing(at)2ndquadrant(dot)com>> wrote:
> >
> > >
> > >     I don't understand what you mean by this.
> > >
> > >
> > >
> > > The primary purpose of adding row end time to primary key is to
> > allow
> > > duplicate value to be inserted into a table with keeping
> > constraint in
> > > current data but it can be duplicated in history data. Adding row
> > > start time column to primary key will eliminate this uniqueness for
> > > current data which is not correct  
> >
> >
> > How?  The primary/unique keys must always be unique at every point
> > in time.
> >
> >
> > From user prospect it is acceptable to delete and reinsert a record
> > with the same key value multiple time which means there will be
> > multiple record with the same key value in a history data but there is
> > only one values in current data as a table without system versioning
> > do .I add row end time column to primary key to allow user supplied
> > primary key values to be duplicated in history data which is acceptable
> >
>
> Yes, I understand that.  I'm saying you should also add the row start
> time column.

I think that the start and end timestamps represent the period where
that version of the row was active. So UPDATE should modify the start
timestamp of the new version to the same value with the end timestamp
of the old version to the updated time. Thus, I don't think adding
start timestamp to PK doesn't work as expected. That hinders us from
rejecting rows with the same user-defined unique key because start
timestams is different each time of inserts. I think what Surafel is
doing in the current patch is correct. Only end_timestamp = +infinity
rejects another non-historical (= live) version with the same
user-defined unique key.

I'm not sure why the patch starts from "0002, but anyway it applied
on e369f37086. Then I ran make distclean, ./configure then make all,
make install, initdb and started server after all of them.

First, I tried to create a temporal table.

When I used timestamptz as the type of versioning columns, ALTER,
CREATE commands ended with server crash.

"CREATE TABLE t (a int, s timestamptz GENERATED ALWAYS AS ROW START, e timestamptz GENERATED ALWAYS AS ROW END);"
(CREATE TABLE t (a int);)
"ALTER TABLE t ADD COLUMN s timestamptz GENERATED ALWAYS AS ROW START"
"ALTER TABLE t ADD COLUMN s timestamptz GENERATED ALWAYS AS ROW START, ADD COLUMN e timestamptz GENERATED ALWAYS AS ROW END"

If I added the start/end timestamp columns to an existing table, it
returns uncertain error.

ALTER TABLE t ADD COLUMN s timestamp(6) GENERATED ALWAYS AS ROW START;
ERROR: column "s" contains null values
ALTER TABLE t ADD COLUMN s timestamp(6) GENERATED ALWAYS AS ROW START, ADD COLUMN e timestamp(6) GENERATED ALWAYS AS ROW END;
ERROR: column "s" contains null values

When I defined only start column, SELECT on the table crashed.

"CREATE TABLE t (s timestamp(6) GENERATED ALWAYS AS ROW START);"
"SELECT * from t;"
(crashed)

The following command ended with ERROR which I cannot understand the
cause, but I expected the command to be accepted.

ALTER TABLE t ADD COLUMN start timestamp(6) GENERATED ALWAYS AS ROW START, ADD COLUMN end timestamp(6) GENERATED ALWAYS AS ROW END;
ERROR: syntax error at or near "end"

I didin't examined further but the syntax part doesn't seem designed
well, and the body part seems vulnerable to unexpected input.

I ran a few queries:

SELECT * shows the timestamp columns, don't we need to hide the period
timestamp columns from this query?

I think UPDATE needs to update the start timestamp, but it doesn't. As
the result the timestamps doesn't represent the correct lifetime of
the row version and we wouldn't be able to pick up correct versions of
a row that exprerienced updates. (I didn't confirmed that because I
couldn't do "FOR SYSTEM_TIME AS OF" query due to syntax error..)

(Sorry in advance for possible pointless comments due to my lack of
access to the SQL11 standard.) If we have the period-timestamp
columns before the last two columns, INSERT in a common way on the
table fails, which doesn't seem to me to be expected behavior:

CREATE TABLE t (s timestamp(6) GENERATED ALWAYS AS ROW START, e timestamp(6) GENERATED ALWAYS AS ROW END, a int) WITH SYSTEM VERSIONING;
INSERT INTO t (SELECT a FROM generate_series(0, 99) a);
ERROR: column "s" is of type timestamp without time zone but expression is of type integer

Some queries using SYSTEM_TIME which I think should be accepted ends
with error. Is the grammar part missing something?

SELECT * FROM t FOR SYSTEM_TIME AS OF '2020-01-07 09:57:55';
ERROR: syntax error at or near "system_time"
LINE 1: SELECT * FROM t FOR SYSTEM_TIME AS OF '2020-01-07 09:57:55';

SELECT * FROM t FOR SYSTEM_TIME BETWEEN '2020-01-07 0:00:00' AND '2020-01-08 0:00:00';
ERROR: syntax error at or near "system_time"
LINE 1: SELECT * FROM t FOR SYSTEM_TIME BETWEEN '2020-01-07 0:00:00'...

Other random comments (sorry for it not being comprehensive):

The patch at worst loops over all columns at every parse time. It is
quite ineffecient if there are many columns. We can store the column
indexes in relcache.

If I'm not missing anything, alter table doesn't properly modify
existing data in the target table. AddSystemVersioning should fill in
start/end_timestamp with proper values and DropSystemVersioning shuold
remove rows no longer useful.

+makeAndExpr(Node *lexpr, Node *rexpr, int location)

I believe that planner flattenes nested AND/ORs in
eval_const_expressions(). Shouldn't we leave the work to the planner?

+makeConstraint(ConstrType type)

We didn't use such a function to make that kind of nodes. Maybe we
should use makeNode directly, or we need to change similar coding into
that using the function. Addition to that, setting .location to 1 is
wrong. "Unknown" location is -1.

Separately from that, temporal clauses is not restriction of a
table. So it seems wrong to me to use constraint mechamism for this
purpose.

+makeSystemColumnDef(char *name)

"system column (or attribute)" is a column specially treated outside
of tuple descriptor. The temporal-period columns are not system
columns in that sense.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dean Rasheed 2020-01-07 11:00:30 Re: Errors when update a view with conditional-INSTEAD rules
Previous Message Fabien COELHO 2020-01-07 09:32:41 Re: pgbench - use pg logging capabilities