Re: Deferred constraint trigger semantics

From: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
To: Bryn Llewellyn <bryn(at)yugabyte(dot)com>
Cc: pgsql-general list <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Deferred constraint trigger semantics
Date: 2022-05-13 06:28:46
Message-ID: 4e2de8ec77447a8347ea1923171afb6bcf77397b.camel@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, 2022-05-12 at 22:06 -0700, Bryn Llewellyn wrote:
>
> > In the case of constraint triggers, yes. But there is no race condition for primary key,
> > unique and foreign key constraints, because they also "see" uncommitted data.
>
> I can't follow you here, sorry. I tried this:
>
> create table t(
>   k serial primary key,
>   v int not null,
>   constraint t_v_unq unique(v) initially deferred);
>
> -- RED
> start transaction isolation level read committed;
> insert into t(v) values (1), (2);
> select k, v from t order by k;
>
> -- BLUE
> start transaction isolation level read committed;
> insert into t(v) values (1), (3);
> select k, v from t order by k;
>
> -- RED
> commit;
> select k, v from t order by k;
>
> -- BLUE
> select k, v from t order by k;
> commit;
>
> select k, v from t order by k;
>
> The first "select" from the "BLUE" session at the very end produces this:
>
>  k | v 
> ---+---
>  1 | 1
>  2 | 2
>  3 | 1
>  4 | 3
>
> This doesn't surprise me. It's ultimately illegal. But not yet. (Before "RED" committed, "BLUE"
> didn't see the rows with "k = 1" and "k = 2". So it isn't seeing any other sessions uncommitted
> data—but only it's own uncommitted data.)

Be "seeing" I didn't mean "show to the user".
I mean that the code that implements PostgreSQL constraints takes uncommitted data into account.

The documentation describes that for the case of uniqueness in some detail:
https://www.postgresql.org/docs/current/index-unique-checks.html

> Then, when "BLUE" commits, it (of course) gets this:
>
> ERROR:  duplicate key value violates unique constraint "t_v_unq"
> DETAIL:  Key (v)=(1) already exists.
>
> Then it sees (of course, again) only the rows with "k = 1" and "k = 2"—the same as what "RED" saw.
>
> It seems to be impossible to do a test in slow motion where "RED" and "BLUE" each issues "commit"
> at the exact same moment. So thinking about this scenario doesn't tell me if:
>
> (a) Each session runs its constraint check and the rest of what "commit" entails in a genuinely serialized fashion.
>
> OR
>
> (b) Each session first runs its constraint check (and some other stuff) non-serializedly—and only
> then runs the small part of the total "commit" action (the WAL part) serializedly.
> (This would result in bad data in the database at rest—just as my contrived misuse of
> "set constraints all immediate" left things in my "one or two admins" scenario.)

I'd say that (b) is a more accurate description.

> The (a) scheme sounds correct. And the (b) scheme sounds wrong. Why would PG prefer to implement (b) rather than (a)?
>
> I'm clearly missing something.

Because (a) would result in terrible performance if there are many concurrent transactions.

I don't see why (b) is wrong - as your example shows, the behavior is correct.

Perhaps you have to understand what a PostgreSQL "snapshot" is and that the exact moment at
which a row was created is not important - it is the transaction numbers in "xmin" and "xmax"
that count.

> >
> Where, in the PG doc, can I read the account of the proper mental model for the application programmer?
> It seems to be impossible to conduct an experiment that would disprove the hypothesis that one,
> or the other, of these mental models is correct.

I'd say that the proper mental model is that you don't need to care.
The ACID properties are guarantees that the database makes, and these guarantees are usually
negative: "no sequence of actions can result in the violation of a unique constraint" or
"you don't get to see uncommitted data".

The exact sequence of what happens during COMMIT is interesting, but irrelevant to the
programmer. All that counts is "a deferred constraint is checked between the time that
COMMIT starts processing and the time that it returns".

If you want to know more, you have to start reading the code. It is open source and
well documented.

> > > >
> Your post's testcase used the condition "at least one guard on duty" [...]
>
> My testcase used a stricter rule: the table of staff must have exactly one or two rows where
> the job is "Admin". So, here, concurrent sessions can break the rule (when the txn starts
> with one "Admin") by updating different rows to make them "Admin" or by inserting different
> new "Admin" rows. I've convinced myself by experiment that an ordinary trigger can enforce
> this rule when contending sessions use "serializable" isolation. Am I right that you'd say
> that no pessimistic locking scheme can enforce the rule at lower isolation levels except the
> brute-force "lock table"?

If the transaction can insert two rows, I'd agree.
If the transaction only inserts a single row, than it could SELECT ... FOR NO KEY UPDATE the
one existing row, thus serializing the concurrent transactions.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Bryn Llewellyn 2022-05-13 06:37:39 Re: Restricting user to see schema structure
Previous Message Neeraj M R 2022-05-13 05:51:21 Re: Restricting user to see schema structure