Re: SQL:2011 application time

From: Paul Jungwirth <pj(at)illuminatedcomputing(dot)com>
To: Peter Eisentraut <peter(at)eisentraut(dot)org>, jian he <jian(dot)universality(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: SQL:2011 application time
Date: 2024-04-03 05:30:20
Message-ID: 47550967-260b-4180-9791-b224859fe63e@illuminatedcomputing.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/24/24 00:38, Peter Eisentraut wrote:> I have committed the patches
> v33-0001-Add-temporal-FOREIGN-KEYs.patch and v33-0002-Support-multiranges-in-temporal-FKs.patch
> (together).

Hi Hackers,

I found some problems with temporal primary keys and the idea of uniqueness, especially around the
indisunique column. Here are some small fixes and a proposal for a larger fix, which I think we need
but I'd like some feedback on.

The first patch fixes problems with ON CONFLICT DO NOTHING/UPDATE.

DO NOTHING fails because it doesn't expect a non-btree unique index. It's fine to make it accept a
temporal PRIMARY KEY/UNIQUE index though (i.e. an index with both indisunique and indisexclusion).
This is no different than an exclusion constraint. So I skip BuildSpeculativeIndexInfo for WITHOUT
OVERLAPS indexes. (Incidentally, AFAICT ii_UniqueOps is never used, only ii_UniqueProcs. Right?)

We should still forbid temporally-unique indexes for ON CONFLICT DO UPDATE, since there may be more
than one row that conflicts. Ideally we would find and update all the conflicting rows, but we don't
now, and we don't want to update just one:

postgres=# create table t (id int4range, valid_at daterange, name text, constraint tpk primary
key (id, valid_at without overlaps));
CREATE TABLE
postgres=# insert into t values ('[1,2)', '[2000-01-01,2001-01-01)', 'a'), ('[1,2)',
'[2001-01-01,2002-01-01)', 'b');
INSERT 0 2
postgres=# insert into t values ('[1,2)', '[2000-01-01,2002-01-01)', 'c') on conflict (id,
valid_at) do update set name = excluded.name;
INSERT 0 1
postgres=# select * from t;
id | valid_at | name
-------+-------------------------+------
[1,2) | [2001-01-01,2002-01-01) | b
[1,2) | [2000-01-01,2001-01-01) | c
(2 rows)

So I also added code to prevent that. This is just preserving the old behavior for exclusion
constraints, which was bypassed because of indisunique. All this is in the first patch.

That got me thinking about indisunique and where else it could cause problems. Perhaps there are
other places that assume only b-trees are unique. I couldn't find anywhere that just gives an error
like ON CONFLICT, but I can imagine more subtle problems.

A temporal PRIMARY KEY or UNIQUE constraint is unique in at least three ways: It is *metaphorically*
unique: the conceit is that the scalar part is unique at every moment in time. You may have id 5 in
your table more than once, as long as the records' application times don't overlap.

And it is *officially* unique: the standard calls these constraints unique. I think it is correct
for us to report them as unique in pg_index.

But is it *literally* unique? Well two identical keys, e.g. (5, '[Jan24,Mar24)') and (5,
'[Jan24,Mar24)'), do have overlapping ranges, so the second is excluded. Normally a temporal unique
index is *more* restrictive than a standard one, since it forbids other values too (e.g. (5,
'[Jan24,Feb24)')). But sadly there is one exception: the ranges in these keys do not overlap: (5,
'empty'), (5, 'empty'). With ranges/multiranges, `'empty' && x` is false for all x. You can add that
key as many times as you like, despite a PK/UQ constraint:

postgres=# insert into t values
('[1,2)', 'empty', 'foo'),
('[1,2)', 'empty', 'bar');
INSERT 0 2
postgres=# select * from t;
id | valid_at | name
-------+----------+------
[1,2) | empty | foo
[1,2) | empty | bar
(2 rows)

Cases like this shouldn't actually happen for temporal tables, since empty is not a meaningful
value. An UPDATE/DELETE FOR PORTION OF would never cause an empty. But we should still make sure
they don't cause problems.

One place we should avoid temporally-unique indexes is REPLICA IDENTITY. Fortunately we already do
that, but patch 2 adds a test to keep it that way.

Uniqueness is an important property to the planner, too.

We consider indisunique often for estimates, where it needn't be 100% true. Even if there are
nullable columns or a non-indimmediate index, it still gives useful stats. Duplicates from 'empty'
shouldn't cause any new problems there.

In proof code we must be more careful. Patch 3 updates relation_has_unique_index_ext and
rel_supports_distinctness to disqualify WITHOUT OVERLAPS indexes. Maybe that's more cautious than
needed, but better safe than sorry. This patch has no new test though. I had trouble writing SQL
that was wrong before its change. I'd be happy for help here!

Another problem is GROUP BY and functional dependencies. This is wrong:

postgres=# create table a (id int4range, valid_at daterange, name text, constraint apk primary
key (id, valid_at without overlaps));
CREATE TABLE
postgres=# insert into a values ('[1,2)', 'empty', 'foo'), ('[1,2)', 'empty', 'bar');
INSERT 0 2
postgres=# select * from a group by id, valid_at;
id | valid_at | name
-------+----------+------
[1,2) | empty | foo
(1 row)

One fix is to return false from check_functional_grouping for WITHOUT OVERLAPS primary keys. But I
think there is a better fix that is less ad hoc.

We should give temporal primary keys an internal CHECK constraint saying `NOT isempty(valid_at)`.
The problem is analogous to NULLs in parts of a primary key. NULLs prevent two identical keys from
ever comparing as equal. And just as a regular primary key cannot contain NULLs, so a temporal
primary key should not contain empties.

The standard effectively prevents this with PERIODs, because a PERIOD adds a constraint saying start
< end. But our ranges enforce only start <= end. If you say `int4range(4,4)` you get `empty`. If we
constrain primary keys as I'm suggesting, then they are literally unique, and indisunique seems safer.

Should we add the same CHECK constraint to temporal UNIQUE indexes? I'm inclined toward no, just as
we don't forbid NULLs in parts of a UNIQUE key. We should try to pick what gives users more options,
when possible. Even if it is questionably meaningful, I can see use cases for allowing empty ranges
in a temporal table. For example it lets you "disable" a row, preserving its values but marking it
as never true.

Also it gives you a way to make a non-temporal foreign key reference to a temporal table. Normally
temporal tables are "contagious", which is annoying. But if the referencing table had 'empty' for
its temporal part, then references should succeed. For example this is true: 'empty'::daterange <@
'[2000-01-01,2001-01-01)'. (Technically this would require a small change to our FK SQL, because we
do `pkperiod && fkperiod` as an optimization (to use the index more fully), and we would need to
skip that when fkperiod is empty.)

Finally, if we have a not-empty constraint on our primary keys, then the GROUP BY problem above goes
away. And we can still use temporal primary keys in proofs (but maybe not other temporally-unique
indexes). We can allow them in relation_has_unique_index_ext/rel_supports_distinctness.

The drawback to putting a CHECK constraint on just PKs and not UNIQUEs is that indisunique may not
be literally unique for them, if they have empty ranges. But even for traditional UNIQUE
constraints, indisunique can be misleading: If they have nullable parts, identical keys are still
"unique", so the code is already careful about them. Do note though the problems come from 'empty'
values, not nullable values, so there might still be some planner rules we need to correct.

Another drawback is that by using isempty we're limiting temporal PKs to just ranges and
multiranges, whereas currently any type with appropriate operators is allowed. But since we decided
to limit FKs already, I think this is okay. We can open it back up again later if we like (e.g. by
adding a support function for the isempty concept).

I'll start working on a patch for this too, but I'd be happy for early feedback/objections/etc.

I guess an alternative would be to add a new operator, say &&&, that is the same as overlaps, except
'empty' overlaps everything instead of nothing. In a way that seems more consistent with <@. (How
can a range contain something if it doesn't overlap it?) I don't love that a key like (5, 'empty')
would conflict with every other 5, but you as I said it's not a meaningful value in a temporal table
anyway. Or you could have 'empty' overlap nothing except itself. Maybe I prefer this solution to an
internal CHECK constraint, but it feels like it has more unknown unknowns. Thoughts?

Also I suspect there are still places where indisunique causes problems. I'll keep looking for them,
but if others have thoughts please let me know.

Patches here are generated against c627d944e6.

Yours,

--
Paul ~{:-)
pj(at)illuminatedcomputing(dot)com

Attachment Content-Type Size
v1-0001-Fix-ON-CONFLICT-DO-NOTHING-UPDATE-for-temporal-in.patch text/x-patch 21.3 KB
v1-0002-Add-test-for-REPLICA-IDENTITY-with-a-temporal-key.patch text/x-patch 1.6 KB
v1-0003-Don-t-treat-WITHOUT-OVERLAPS-indexes-as-unique-in.patch text/x-patch 3.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2024-04-03 05:43:10 Re: Synchronizing slots from primary to standby
Previous Message Michael Zhilin 2024-04-03 05:08:26 Re: BUG: deadlock between autovacuum worker and client backend during removal of orphan temp tables with sequences