Re: Reduce lock level for ALTER TABLE ... ADD CHECK .. NOT VALID

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Reduce lock level for ALTER TABLE ... ADD CHECK .. NOT VALID
Date: 2021-09-04 19:28:20
Message-ID: 2791312.1630783700@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com> writes:
> 897795240cfaaed724af2f53ed2c50c9862f951f forgot to reduce the lock
> level for CHECK constraints when allowing them to be NOT VALID.
> This is simple and safe, since check constraints are not used in
> planning until validated.

Unfortunately, just asserting that it's safe doesn't make it so.

We have two things that we need to worry about when considering
reducing ALTER TABLE lock levels:

1. Is it semantically okay (which is what you claim above)?

2. Will onlooker processes see sufficiently-consistent catalog data
if they look at the table during the change?

Trying to reduce the lock level for ADD CHECK fails the second
test, because it has to alter two different catalogs. It has
to increment pg_class.relchecks, and it has to make an entry in
pg_constraint. This patch makes it possible for onlookers to
see a value of pg_class.relchecks that is inconsistent with what
they find in pg_constraint, and then they will blow up.

To demonstrate this, I applied the patch and then did this
in session 1:

regression=# create table mytable (f1 int check(f1 > 0), f2 int);
CREATE TABLE

I then started a second session, attached to it with gdb, and
set a breakpoint at CheckConstraintFetch. Letting that session
continue, I told it

regression=# select * from mytable;

which of course reached the breakpoint at CheckConstraintFetch.
(At this point, session 2 has read the pg_class entry for mytable,
seen relchecks == 1, and now it wants to read pg_constraint.)

I then told session 1:

regression=# alter table mytable add check (f2 > 0);
ALTER TABLE

which it happily did thanks to the now-inadequate lock level.
I then released session 2 to continue, and behold it complains:

WARNING: unexpected pg_constraint record found for relation "mytable"
LINE 1: select * from mytable;
^

(Pre-v14 branches would have made that an ERROR not a WARNING.)
That happens because the systable_beginscan() in CheckConstraintFetch
will get a new snapshot, so now it sees the new entry in pg_constraint,
making the count of entries inconsistent with what it found in pg_class.

It's possible that this could be made safe if we replaced the exact
"relchecks" count with a boolean "relhaschecks", analogous to the
way indexes are handled. It's not clear to me though that the effort,
and ensuing compatibility costs for applications that look at pg_class,
would be repaid by having a bit more concurrency here. One could
also worry about whether we really want to give up this consistency
cross-check between pg_class and pg_constraint.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jaime Casanova 2021-09-04 19:55:56 Re: SQL:2011 application time
Previous Message Tom Lane 2021-09-04 18:53:02 Re: PG Docs - CREATE SUBSCRIPTION option list order