Re: ALTER TABLE with ADD COLUMN and ADD PRIMARY KEY USING INDEX throws spurious "column contains null values"

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Zhang, Jie" <zhangjie2(at)cn(dot)fujitsu(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ALTER TABLE with ADD COLUMN and ADD PRIMARY KEY USING INDEX throws spurious "column contains null values"
Date: 2019-04-18 03:54:49
Message-ID: 12166.1555559689@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Zhang, Jie" <zhangjie2(at)cn(dot)fujitsu(dot)com> writes:
> Here's a patch to fix this bug.

I took a look at this patch, but I really dislike it: it adds a mighty
ad-hoc parameter to a whole bunch of functions that shouldn't really
have anything to do with the problem. Moreover, I have little confidence
that it's really solving the problem and not introducing any new problems
(such as failure to apply the not-null check when we need to).

I think the real problem is exactly that we've got index_check_primary_key
doing its own mini ALTER TABLE in ignorance of what might be happening
in the outer ALTER TABLE. That's just ripe for order-of-operations
problems, seeing that ALTER TABLE has a lot of very careful decisions
about which steps have to happen before which other ones. Moreover,
as this old comment notes, it's a horridly inefficient approach if
the table is large:

/*
* XXX: possible future improvement: when being called from ALTER TABLE,
* it would be more efficient to merge this with the outer ALTER TABLE, so
* as to avoid two scans. But that seems to complicate DefineIndex's API
* unduly.
*/

So I thought a bit about how to fix that, and realized that we could
easily adjust the parser to emit AT_SetNotNull subcommands as part of the
outer ALTER TABLE that has the ADD PRIMARY KEY subcommand. Then,
index_check_primary_key doesn't need to do anything at all about adding
NOT NULL, although it seems like a good safety feature for it to check
that somebody else already added that.

So, attached is a WIP patch that fixes it that way. Some notes
for review:

* I initially thought that index_check_primary_key could be simplified
to the point where it *only* throws an error for missing NOT NULL.
This soon proved to be wrong, because the comments for the function
are lies, or at least obsolete: there are multiple scenarios in which
a CREATE TABLE with a PRIMARY KEY option does need this function to
perform ALTER TABLE SET NOT NULL. Fortunately, that's not so expensive
in that case, since the table must be empty. So as coded, it throws
an error if is_alter_table, and otherwise does what it did before.

* We need to fix the order of operations in ALTER TABLE phase 2 so that
any AT_SetNotNull subcommands happen after the AT_PASS_ADD_COL pass
(else the column might not be there yet) and before AT_PASS_ADD_INDEX
(else index_check_primary_key will complain). I did this by putting
AT_SetNotNull into the AT_PASS_COL_ATTRS pass and moving that to after
AT_PASS_ADD_COL; it had been before AT_PASS_ADD_COL, but that seems at
best random and at worst broken. (AT_SetIdentity is the only existing
subcommand using AT_PASS_COL_ATTRS, and it sure seems like it'd make more
sense to run it after AT_PASS_ADD_COL, so that it can work on a column
being added in the same ALTER. Am I missing something?)

* Some existing regression tests for "ALTER TABLE ONLY partitioned_table
ADD PRIMARY KEY" failed. That apparently is supposed to work if all
partitions already have a suitable unique index and NOT NULL constraint,
but it was failing because ATPrepSetNotNull wasn't expecting to be used
this way. I thought that function was a pretty terrible kluge anyway,
so what I did was to refactor things so that in this scenario we just
apply checks to see if all the partitions already have suitable NOT NULL.
Note that this represents looser semantics than what was there before,
because previously you couldn't say "ALTER TABLE ONLY partitioned_table
SET NOT NULL" if there were any partitions; now you can, if the partitions
all have suitable NOT NULL already. We probably ought to change the error
message to reflect that, but I didn't yet.

* A couple of existing test cases change error messages, like so:

-ERROR: column "test1" named in key does not exist
+ERROR: column "test1" of relation "atacc1" does not exist

This is because the added AT_SetNotNull subcommand runs before
AT_AddIndex, so it's the one that notices that there's not really
any such column, and historically it's spelled its error message
differently. This change seems all to the good to me, so I didn't
try to avoid it.

* I haven't yet added any test case(s) reflecting the bug fix nor
the looser semantics for adding NOT NULL to partitioned tables.
It does pass check-world as presented.

* I'm not sure whether we want to try to back-patch this, or how
far it should go. The misbehavior has been there a long time
(at least back to 8.4, I didn't check further); so the lack of
previous reports means people seldom try to do it. That may
indicate that it's not worth taking any risks of new bugs to
squash this one. (Also, I suspect that it might take a lot of
work to port this to before v10, because there are comments
suggesting that this code worked a good bit differently before.)
I do think we should shove this into v12 though.

Comments?

regards, tom lane

Attachment Content-Type Size
fix-alter-table-add-primary-key-vs-not-null-1.patch text/x-diff 16.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2019-04-18 04:14:13 Re: Cleanup/remove/update references to OID column
Previous Message Amit Langote 2019-04-18 02:42:49 Re: pgsql: Fix plan created for inherited UPDATE/DELETE with all tables exc