Re: cataloguing NOT NULL constraints

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: cataloguing NOT NULL constraints
Date: 2023-04-09 16:23:50
Message-ID: 3801207.1681057430@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> I think
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2023-04-07%2021%3A16%3A04
> might point out a problem with the pg_dump or pg_upgrade backward compat
> paths:

Yeah, this patch has broken every single upgrade-from-back-branch test.

I think there's a second problem, though: even without considering
back branches, this has changed pg_dump output in a way that
I fear is unacceptable. Consider for instance this table definition
(from rules.sql):

create table rule_and_refint_t1 (
id1a integer,
id1b integer,
primary key (id1a, id1b)
);

This used to be dumped as

CREATE TABLE public.rule_and_refint_t1 (
id1a integer NOT NULL,
id1b integer NOT NULL
);
...
... load data ...
...
ALTER TABLE ONLY public.rule_and_refint_t1
ADD CONSTRAINT rule_and_refint_t1_pkey PRIMARY KEY (id1a, id1b);

In the new dispensation, pg_dump omits the NOT NULL clauses.
Great, you say, that makes the output more like what the user wrote.
I'm not so sure. This means that the ALTER TABLE will be compelled
to perform a full-table scan to verify that there are no nulls in the
already-loaded data before it can add the missing NOT NULL constraint.
The old dump output was carefully designed to avoid the need for that
scan. Admittedly, we have to do a scan anyway to build the index,
so this is strictly less than a 2X penalty on the ALTER, but is
that acceptable? It might be all right in the context of regular
dump/restore, where we're surely doing a lot of per-row work anyway
to load the data and make the index. In the context of pg_upgrade,
though, it seems absolutely disastrous: there will now be a per-row
cost where there was none before, and that is surely a deal-breaker.

BTW, I note from testing that the NOT NULL clauses *are* still
emitted in at least some cases when doing --binary-upgrade from an old
version. (This may be directly related to the buildfarm failures,
not sure.) That's no solution though, because now what you get in
pg_constraint will differ depending on which way you upgraded,
which seems unacceptable too.

I'm inclined to think that this idea of suppressing the implied
NOT NULL from PRIMARY KEY is a nonstarter and we should just
go ahead and make such a constraint. Another idea could be for
pg_dump to emit the NOT NULL, load data, do the ALTER ADD PRIMARY
KEY, and then ALTER DROP NOT NULL.

In any case, I wonder whether that's the sort of redesign we should
be doing post-feature-freeze. It might be best to revert and try
again in v17.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2023-04-09 16:35:29 Re: Direct I/O
Previous Message Andrew Dunstan 2023-04-09 13:14:28 Re: Direct I/O