Re: Rearranging ALTER TABLE to avoid multi-operations bugs

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Rearranging ALTER TABLE to avoid multi-operations bugs
Date: 2020-01-21 00:46:52
Message-ID: 6758.1579567612@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> The squirrely-ness around identity is that while this now works:

> regression=# CREATE TABLE itest8 (f1 int);
> CREATE TABLE
> regression=# ALTER TABLE itest8
> regression-# ADD COLUMN f2 int NOT NULL,
> regression-# ALTER COLUMN f2 ADD GENERATED ALWAYS AS IDENTITY;
> ALTER TABLE

> it doesn't work if there's rows in the table:

> regression=# CREATE TABLE itest8 (f1 int);
> CREATE TABLE
> regression=# insert into itest8 default values;
> INSERT 0 1
> regression=# ALTER TABLE itest8
> ADD COLUMN f2 int NOT NULL,
> ALTER COLUMN f2 ADD GENERATED ALWAYS AS IDENTITY;
> ERROR: column "f2" contains null values

> The same would be true if you tried to do the ALTER as two separate
> operations (because the ADD ... NOT NULL, without a default, will
> naturally fail on a nonempty table). So I don't feel *too* awful
> about that. But it'd be better if this worked.

After further poking at that, I've concluded that maybe this is not
a bug but operating as designed. Adding the GENERATED property in a
separate step is arguably equivalent to setting a plain default in
a separate step, and look at how we handle that:

regression=# create table t1(x int);
CREATE TABLE
regression=# insert into t1 values(1);
INSERT 0 1
regression=# alter table t1 add column y int default 11,
alter column y set default 12;
ALTER TABLE
regression=# table t1;
x | y
---+----
1 | 11
(1 row)

This is documented, rather opaquely perhaps, for the SET DEFAULT
case:

SET/DROP DEFAULT

These forms set or remove the default value for a column. Default
values only apply in subsequent INSERT or UPDATE commands; they do not
cause rows already in the table to change.

So the design principle here seems to be that we fill the column
using whatever is specified *in the ADD COLUMN subcommand*, and
any screwing-about in other subcommands just affects what the
behavior will be in subsequent INSERT commands. That's a little
weird but it has potential use-cases. If we attempt to apply the
"new" default immediately then this syntax devolves to having the
same effects as a simple ADD-COLUMN-with-default. There's not
a lot of reason to write the longer form if that's what you wanted.

So I'm now inclined to think that the code is all right. We could
improve the documentation, perhaps, with an explicit example.
Also, the man page's entry for SET GENERATED says nothing of this,
but it likely ought to say the same thing as SET DEFAULT.

Also, we don't really have any test cases proving it works that way.
Simple tests, such as the one above, are not too trustworthy because
the attmissingval optimization tends to hide what's really happening.
(I found this out the hard way while messing with a patch to change
the behavior --- which I now think we shouldn't do, anyhow.)

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-01-21 00:51:16 Re: libxml2 is dropping xml2-config
Previous Message Thomas Munro 2020-01-21 00:45:12 Reducing WaitEventSet syscall churn