Re: SQL:2011 application time

From: Paul A Jungwirth <pj(at)illuminatedcomputing(dot)com>
To: jian he <jian(dot)universality(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(at)eisentraut(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: SQL:2011 application time
Date: 2024-03-16 21:37:10
Message-ID: CA+renyXHFFyq1n2+4cwcFoLXqzUhb6u6WCL6wjk8ZjP9WxTExw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

Here is a new patch series addressing the last few feedback emails
from Peter & Jian He. It mostly focuses on the FKs patch, trying to
get it really ready to commit, but it also finishes restoring all the
functionality to the PERIODs patch (that I removed temporarily when we
changed PERIODs to GENERATED columns). I still want to restore a few
more tests there, but all the functionality is back (e.g. PERIODs with
foreign keys and FOR PORTION OF), so it proves the GENERATED idea
works in principle. Specific feedback below:

On Mon, Mar 11, 2024 at 12:46 AM Peter Eisentraut <peter(at)eisentraut(dot)org> wrote:
> I had committed v27-0001-Rename-conwithoutoverlaps-to-conperiod.patch a
> little while ago.

Thanks! It looks like you also fixed the pg_catalog docs which I missed.

> Attached is a small patch that changes the PERIOD keyword to unreserved
> for this patch. You had said earlier that this didn't work for you.
> The attached patch works for me when applied on top of 0003.

Applied and included here.

> You wrote:
>
> > I'm not sure how else to do it. The issue is that `range_agg` returns
> > a multirange, so the result
> > type doesn't match the inputs. But other types will likely have the
> > same problem: to combine boxes
> > you may need a multibox. The combine mdranges you may need a
> > multimdrange.
>
> Can we just hardcode the use of range_agg for this release? Might be
> easier. I don't see all this generality being useful in the near future.

Okay, I've hard-coded range_agg in the main patch and separated the
support for multirange/etc in the next two patches. But there isn't
much code there (mostly tests and docs). Since we can't hard-code the
*operators*, most of the infrastructure is already there not to
hard-code the aggregate function. Supporting multiranges is already a
nice improvement. E.g. it should cut down on disk usage when a record
gets updated frequently. Supporting arbitrary types also seems very
powerful, and we already do that for PKs.

> > Btw that part changed a bit since v24 because as jian he pointed out,
> > our type system doesn't
> > support anyrange inputs and an anyrange[] output. So I changed the
> > support funcs to use SETOF.
>
> I didn't see any SETOF stuff in the patch, or I didn't know where to look.
>
> I'm not sure I follow all the details here. So more explanations of any
> kind could be helpful.

This is talking about the FOR PORTION OF patch, not the FKs patch. It
is the function that gives the "leftovers" after a temporal
UPDATE/DELETE. There is explanation in the preliminary patch (adding
the support function) and the actual FOR PORTION OF patch, but if you
think they need more let me know.

But I'd love to talk more about this here: The reason for using a
SETOF function is because you can't return an anyarray from a function
that takes anyrange or anymultirange. Or rather if you do, the array
elements match the rangetype's bounds' type, not the rangetype itself:
`T[] f(rangetype<T>)`, not `rangetype<T>[] f(rangetype<T>)`, and we
need the latter. So to get a list of rangetype objects we do a SETOF
function that is `anyrange f(anyrange)`. Personally I think an
improvement would be to add a broken-out patch to add pseudotypes
called anyrangearray and anymultirangearray, but using SETOF works
now, and I don't know if anyone is interested in such a patch. But
it's not the first time I've hit this shortcoming in the pg type
system, so I think it's worthwhile. And since FOR PORTION OF isn't
getting into v17, there is time to do it. What do you think? If it's
an acceptable idea I will get started. It should be a separate
commitfest entry I think.

> * v27-0003-Refactor-FK-operator-lookup.patch
>
> I suggest to skip this refactoring patch. I don't think the way this is
> sliced up is all that great, and it doesn't actually help with the
> subsequent patches.

Okay.

> - src/backend/catalog/pg_constraint.c
>
> FindFKPeriodOpersAndProcs() could use a bit more top-level
> documentation. Where does the input opclass come from? What are the
> three output values? What is the business with "symmetric types"?

Added and tried to clarify about the types.

> - src/backend/commands/indexcmds.c
>
> GetOperatorFromWellKnownStrategy() is apparently changed to accept
> InvalidOid for rhstype, but the meaning of this is not explained in
> the function header. It's also not clear to me why an existing caller
> is changed. This should be explained more thoroughly.

It's not so much changing a param as removing one and adding another.
The old param was unneeded because it's just the opclass's opcintype,
and we're already passing the opclass. Then the new param lets you
optionally ask for an operator that is not `opcintype op opcintype`
but `opcintype op rhstype`. We need this because FKs compare fkattr <@
range_agg(pkattr)`, and range_agg returns a multirange, not a range.
Even if we hard-code range_agg, the easiest way to get the operator is
to use this function, passing ANYMULTIRANGEOID (but better is to pass
whatever the referencedagg support func returns, as the now-separate
multirange/custom type patch does).

> - src/backend/commands/tablecmds.c
>
> is_temporal and similar should be renamed to with_period or similar
> throughout this patch.

Done.

> In transformFkeyGetPrimaryKey():
>
> * Now build the list of PK attributes from the indkey definition (we
> - * assume a primary key cannot have expressional elements)
> + * assume a primary key cannot have expressional elements, unless it
> + * has a PERIOD)
>
> I think the original statement is still true even with PERIOD. The
> expressional elements refer to expression indexes. I don't think we can
> have a PERIOD marker on an expression?

You're right: I wrote this back before PERIODs became GENERATED
columns. Updated now.

> - src/backend/utils/adt/ri_triggers.c
>
> Please remove the separate trigger functions for the period case. They
> are the same as the non-period ones, so we don't need separate ones.
> The difference is handled lower in the call stack, which I think is a
> good setup. Removing the separate functions also removes a lot of extra
> code in other parts of the patch.

Done. The later patch for FKs with CASCADE/SET NULL/SET DEFAULT still
has separate functions (since they call actually-different
implementations), but I will see if I can unify things a bit more
there.

> - src/include/catalog/pg_constraint.h
>
> Should also update catalogs.sgml accordingly.

Looks like you did this already in 030e10ff1a.

> - src/test/regress/expected/without_overlaps.out
> - src/test/regress/sql/without_overlaps.sql
>
> A few general comments on the tests:
>
> - In the INSERT commands, specify the column names explicitly. This
> makes the tests easier to read (especially since the column order
> between the PK and the FK table is sometimes different).

Okay.

> - Let's try to make it so that the inserted literals match the values
> shown in the various error messages, so it's easier to match them up.
> So, change the int4range literals to half-open notation. And also maybe
> change the date output format to ISO.

Done. Also changed the tsrange cols to daterange and made them
YYYY-MM-DD. This is much easier to read IMO.

Note there were already a few tsrange columns in the PK tests, so I
changed those separately in the very first patch here.

> - In various comments, instead of test FK "child", maybe use
> "referencing table"? Instead of "parent", use "referenced table" (or
> primary key table). When I read child and parent I was looking for
> inheritance.

Done.

> - Consider truncating the test tables before each major block of tests
> and refilling them with fresh data. So it's easier to eyeball the
> tests. Otherwise, there is too much dependency on what earlier tests
> left behind.

Done. This will also let me reuse ids in the FOR PORTION OF
partitioned table tests, but that's not done yet.

> A specific question:
>
> In this test, a PERIOD marker on the referenced site is automatically
> inferred from the primary key:
>
> +-- with inferred PK on the referenced table:
> +CREATE TABLE temporal_fk_rng2rng (
> + id int4range,
> + valid_at tsrange,
> + parent_id int4range,
> + CONSTRAINT temporal_fk_rng2rng_pk PRIMARY KEY (id, valid_at WITHOUT
> OVERLAPS),
> + CONSTRAINT temporal_fk_rng2rng_fk FOREIGN KEY (parent_id, PERIOD
> valid_at)
> + REFERENCES temporal_rng
> +);
>
> In your patch, this succeeds. According to the SQL standard, it should
> not. In subclause 11.8, syntax rule 4b:
>
> """
> Otherwise, the table descriptor of the referenced table shall include a
> unique constraint UC that specifies PRIMARY KEY. The table constraint
> descriptor of UC shall not include an application time period name.
> """
>
> So this case is apparently explicitly ruled out.
>
> (It might be ok to make an extension here, but then we should be
> explicit about it.)

Okay, I agree it doesn't match the standard. IMO our behavior is
better, but the patches here should let you go either way. The main FK
patch keeps the old behavior, but there is a follow-up patch doing
what the standard says. There are some interesting implications, which
you can see by looking at the test changes in that patch. Basically
you can never give an inferred REFERENCES against a temporal table.
Either your FK has a PERIOD element, and it fails because we exclude
the PK's WITHOUT OVERLAPS in the inferred attributes, or your FK does
not have a PERIOD element, and it fails because you want a PK side
that is genuinely unique, but the PK index has a temporal definition
of "unique" (and is not B-tree but GiST).

I don't see any drawbacks from supporting inferred REFERENCES with
temporal tables, so my vote is to break from the standard here, and
*not* apply that follow-up patch. Should I add some docs about that?
Also skipping the patch will cause some annoying merge conflicts, so
let me know if that's what you choose and I'll handle them right away.

Btw I tried checking what other vendors do here, but no one supports
temporal FKs yet! MS SQL Server doesn't support application time at
all. Oracle and MariaDB don't support temporal PKs or FKs. And IBM DB2
only supports temporal PKs. Actually DB2's docs in 2019 were
*claiming* they supported temporal FKs, but it didn't work for me or
at least one other person posting in their forums. And the latest docs
no longer mention it.[1] I wrote about trying to make it work in my
survey of other vendors.[2] The old docs are now a 404,[3] as is the
forums post.[4] My DB2 test code is below in case anyone else wants to
try.[5] So there is no precedent here for us to follow.

Incidentally, here are two non-standard things I would like to add "some day":

1. FKs from non-temporal tables to temporal tables. Right now temporal
tables are "contagious", which can be annoying. Maybe a non-temporal
record is valid as long as a referenced temporal row exists at *any
time*. You can't do that today. You can't even add an additional
UNIQUE constraint, because there are surely duplicates that invalidate
it. This kind of FK would be satisfied if *at least one* reference
exists.

2. FKs from a single-timestamp table to a temporal table. Maybe the
referring table is an "event" with no duration, but it is valid as
long as the referenced table contains it. A workaround is to have a
range that is `[t,t]`, but that's annoying.

Anyway that's not important for these patches. As far as I can tell,
whatever we choose re inferred PERIOD in REFERENCES keeps our options
open for those ideas.

One more thought: if we wanted to be cheekily compatible with the
standard, we could infer *range types* that are WITHOUT OVERLAPs but
not true PERIOD objects. "The table constraint descriptor of UC shall
not include an application time period name." If it's a rangetype
column, then it doesn't include a period name. :-P. So then we would
skip the follow-up patch here but I could work it into the final patch
for PERIOD support. This is probably not the wisest choice, although I
guess it does let us defer deciding what to do.

On Mon, Mar 11, 2024 at 7:45 PM jian he <jian(dot)universality(at)gmail(dot)com> wrote:
> typo "referenced_agg", in the gist-extensibility.html page is "referencedagg"
> <literal>WITHOUT PORTION</literal> should be <literal>WITHOUT OVERLAPS</literal>

Good catch! Fixed.

> + While the non-<literal>PERIOD</literal> columns are treated normally
> + (and there must be at least one of them),
> + the <literal>PERIOD</literal> column is not compared for equality.
> the above sentence didn't say what is "normally"?
> maybe we can do the following:
> + While the non-<literal>PERIOD</literal> columns are treated
> + normally for equality
> + (and there must be at least one of them),
> + the <literal>PERIOD</literal> column is not compared for equality.

Reworked the language here.

> my_range_agg_transfn error message is inconsistent?
> `elog(ERROR, "range_agg_transfn called in non-aggregate context");`
> `elog(ERROR, "range_agg must be called with a range");`
> maybe just `my_range_agg_transfn`, instead of mention
> {range_agg_transfn|range_agg}
> similarly my_range_agg_finalfn error is also inconsistent.

This matches what other aggs do (e.g. array_agg, json_agg, etc.) as
well as the actual core range_agg code. And I think it is an
appropriate difference. You only hit the first error if you are
invoking the transfn directly, so that's what we should say. OTOH you
hit the second error by calling the aggregate function, but with the
wrong type. So the error message should mention the aggregate
function.

> my_range_agg_finalfn need `type_is_multirange(mltrngtypoid)`?

This isn't part of the core range_agg_finalfn, so I'd rather not
include it here. And I don't think it is needed. You would only get a
non-multirange if the transfn does something wrong, and even if it
does, the error will be caught and reported in
multirange_get_typcache.

On Mon, Mar 11, 2024 at 7:47 PM jian he <jian(dot)universality(at)gmail(dot)com> wrote:
> maybe just change the tsrange type to daterange, then the dot out file
> will be far less verbose.

Agreed, done.

> minor issues while reviewing v27, 0001 to 0004.
> transformFkeyGetPrimaryKey comments need to update,
> since bool pk_period also returned.

pk_period is no longer returned in this latest patch.

> +/*
> + * FindFKComparisonOperators -
> + *
> + * Gets the operators for pfeqopOut, ppeqopOut, and ffeqopOut.
> + * Sets old_check_ok if we can avoid re-validating the constraint.
> + * Sets old_pfeqop_item to the old pfeqop values.
> + */
> +static void
> +FindFKComparisonOperators(Constraint *fkconstraint,
> + AlteredTableInfo *tab,
> + int i,
> + int16 *fkattnum,
> + bool *old_check_ok,
> + ListCell **old_pfeqop_item,
> + Oid pktype, Oid fktype, Oid opclass,
> + Oid *pfeqopOut, Oid *ppeqopOut, Oid *ffeqopOut)
>
> I think the above comments is
> `Sets the operators for pfeqopOut, ppeqopOut, and ffeqopOut.`.

This whole function is removed.

> + if (is_temporal)
> + {
> + if (!fkconstraint->fk_with_period)
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_FOREIGN_KEY),
> + errmsg("foreign key uses PERIOD on the referenced table but not the
> referencing table")));
> + }
> can be
> if (is_temporal && !fkconstraint->fk_with_period)
> ereport(ERROR,
> (errcode(ERRCODE_INVALID_FOREIGN_KEY),
> errmsg("foreign key uses PERIOD on the referenced table but not the
> referencing table")));

The patch about inferred REFERENCES moves things around a bit, so this
no longer applies.

> + if (is_temporal)
> + {
> + if (!fkconstraint->pk_with_period)
> + /* Since we got pk_attrs, one should be a period. */
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_FOREIGN_KEY),
> + errmsg("foreign key uses PERIOD on the referencing table but not the
> referenced table")));
> + }
> can be
> if (is_temporal && !fkconstraint->pk_with_period)
> /* Since we got pk_attrs, one should be a period. */
> ereport(ERROR,
> (errcode(ERRCODE_INVALID_FOREIGN_KEY),
> errmsg("foreign key uses PERIOD on the referencing table but not the
> referenced table")));

Likewise.

> refactor decompile_column_index_array seems unnecessary.
> Peter already mentioned it at [1], I have tried to fix it at [2].

No, that conversation is about handling WITHOUT OVERLAPS, not PERIOD.
Because the syntax is `valid_at WITHOUT OVERLAPS` but `PERIOD
valid_at` (post vs pre), we must handle PERIOD inside the function.

> I don't understand the "expression elements" in the comments, most of
> the tests case is like

Covered above in Peter's feedback.

> `
> PRIMARY KEY (id, valid_at WITHOUT OVERLAPS)
> `
> + *pk_period = (indexStruct->indisexclusion);
> can be
> `+ *pk_period = indexStruct->indisexclusion;`

No longer included here.

On Wed, Mar 13, 2024 at 5:00 PM jian he <jian(dot)universality(at)gmail(dot)com> wrote:
> @@ -118,12 +120,17 @@ typedef struct RI_ConstraintInfo
> int16 confdelsetcols[RI_MAX_NUMKEYS]; /* attnums of cols to set on
> * delete */
> char confmatchtype; /* foreign key's match type */
> + bool temporal; /* if the foreign key is temporal */
> int nkeys; /* number of key columns */
> int16 pk_attnums[RI_MAX_NUMKEYS]; /* attnums of referenced cols */
> int16 fk_attnums[RI_MAX_NUMKEYS]; /* attnums of referencing cols */
> Oid pf_eq_oprs[RI_MAX_NUMKEYS]; /* equality operators (PK = FK) */
> Oid pp_eq_oprs[RI_MAX_NUMKEYS]; /* equality operators (PK = PK) */
> Oid ff_eq_oprs[RI_MAX_NUMKEYS]; /* equality operators (FK = FK) */
> + Oid period_contained_by_oper; /* operator for PERIOD SQL */
> + Oid agged_period_contained_by_oper; /* operator for PERIOD SQL */
> + Oid period_referenced_agg_proc; /* proc for PERIOD SQL */
> + Oid period_referenced_agg_rettype; /* rettype for previous */
>
> the comment seems not clear to me. Here is my understanding about it:
> period_contained_by_oper is the operator where a single period/range
> contained by a single period/range.
> agged_period_contained_by_oper is the operator oid where a period
> contained by a bound of periods
> period_referenced_agg_proc is the oprcode of the agged_period_contained_by_oper.
> period_referenced_agg_rettype is the function
> period_referenced_agg_proc returning data type.

Expanded these comments a bit.

Thanks to you both for such detailed, careful feedback!

Rebased to 605062227f.

If anything else comes up re FKs I'll tackle that first, but otherwise
I think I will work on some of the outstanding issues in the FOR
PORTION OF patch (e.g. trigger transition table names). I may
experiment with handling the leftover inserts as a separate executor
node. If anyone has advice there I'm happy to hear it!

Yours,
Paul

[1] https://www.ibm.com/docs/en/db2/11.5?topic=statements-alter-table
[2] https://illuminatedcomputing.com/posts/2019/08/sql2011-survey/
[3] https://www.ibm.com/support/knowledgecenter/en/SSEPEK_10.0.0/intro/src/tpc/db2z_integrity.html
[4] https://www.ibm.com/developerworks/community/forums/html/topic?id=440e07ad-23ee-4b0a-ae23-8c747abca819
[5] Here is DB2 test code showing temporal FKs don't work. (Note they
disobey the standard re declaring `PERIOD p (s, e)` not `PERIOD FOR p
(s, e)`, and it must be named `business_time`.)

```
create table t (id integer not null, ds date not null, de date not
null, name varchar(4000), period business_time (ds, de));
alter table t add constraint tpk primary key (id, business_time
without overlaps)
insert into t values (1, '2000-01-01', '2001-01-01', 'foo');
create table fk (id integer, ds date not null, de date not null,
period business_time (ds, de));

-- all this fails:
alter table fk add constraint fkfk foreign key (id, period
business_time) references t (id, period business_time);
alter table fk add constraint fkfk foreign key (id, business_time)
references t (id, business_time);
alter table fk add constraint fkfk foreign key (id, period
business_time) references t;
alter table fk add constraint fkfk foreign key (id, business_time) references t;
alter table fk add constraint fkfk foreign key (id, period for
business_time) references t;
alter table fk add constraint fkfk foreign key (id, period for
business_time) references t (id, period for business_time);
alter table fk add constraint fkfk foreign key (id, business_time
without overlaps) references t;
alter table fk add constraint fkfk foreign key (id, business_time
without overlaps) references t (id, business_time without overlaps);
alter table fk add constraint fkfk foreign key (id) references t;
alter table fk add constraint fkfk foreign key (id) references t (id);
```

Attachment Content-Type Size
v28-0004-Add-GiST-referencedagg-support-func.patch application/octet-stream 11.6 KB
v28-0005-Support-multiranges-or-any-type-in-temporal-FKs.patch application/octet-stream 51.2 KB
v28-0002-Add-temporal-FOREIGN-KEYs.patch application/octet-stream 122.2 KB
v28-0001-Use-daterange-and-YMD-in-tests-instead-of-tsrang.patch application/octet-stream 14.6 KB
v28-0003-Don-t-infer-PERIOD-on-PK-side-of-temporal-FK.patch application/octet-stream 21.2 KB
v28-0006-Add-support-funcs-for-FOR-PORTION-OF.patch application/octet-stream 43.3 KB
v28-0007-Add-UPDATE-DELETE-FOR-PORTION-OF.patch application/octet-stream 161.0 KB
v28-0008-Add-CASCADE-SET-NULL-SET-DEFAULT-for-temporal-fo.patch application/octet-stream 202.1 KB
v28-0009-Add-PERIODs.patch application/octet-stream 278.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Laurenz Albe 2024-03-16 21:59:45 Re: pg_upgrade failing for 200+ million Large Objects
Previous Message Thomas Munro 2024-03-16 20:54:52 Re: Regression tests fail with musl libc because libpq.so can't be loaded