Re: Document atthasmissing default optimization avoids verification table scan

From: James Coleman <jtc331(at)gmail(dot)com>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: "Bossart, Nathan" <bossartn(at)amazon(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Document atthasmissing default optimization avoids verification table scan
Date: 2022-01-20 01:14:10
Message-ID: CAAaqYe9dNfYUObu4aZ=zoDsWmvtcJDFngYZ52NndysT=fM8uvA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 19, 2022 at 7:51 PM David G. Johnston
<david(dot)g(dot)johnston(at)gmail(dot)com> wrote:
>
> On Wed, Jan 19, 2022 at 5:08 PM Bossart, Nathan <bossartn(at)amazon(dot)com> wrote:
>>
>> On 9/24/21, 7:30 AM, "James Coleman" <jtc331(at)gmail(dot)com> wrote:
>> > When PG11 added the ability for ALTER TABLE ADD COLUMN to set a constant
>> > default value without rewriting the table the doc changes did not note
>> > how the new feature interplayed with ADD COLUMN DEFAULT NOT NULL.
>> > Previously such a new column required a verification table scan to
>> > ensure no values were null. That scan happens under an exclusive lock on
>> > the table, so it can have a meaningful impact on database "accessible
>> > uptime".
>>
>> I'm likely misunderstanding, but are you saying that adding a new
>> column with a default value and a NOT NULL constraint used to require
>> a verification scan?
>
>
> As a side-effect of rewriting every live record in the table and indexes to brand new files, yes. I doubt an actual independent scan was performed since the only way for the newly written tuples to not have the default value inserted would be a severe server bug.

I've confirmed it wasn't a separate scan, but it does evaluate all
constraints (it doesn't have any optimizations for skipping ones
probably true by virtue of the new default).

>>
>> + Additionally adding a column with a constant default value avoids a
>> + a table scan to verify no <literal>NULL</literal> values are present.
>>
>> Should this clarify that it's referring to NOT NULL constraints?
>>
>
> This doesn't seem like relevant material to comment on. It's an implementation detail that is sufficiently covered by "making the ALTER TABLE very fast even on large tables".
>
> Also, the idea of performing that scan seems ludicrous. I just added the column and told it to populate with default values - why do you need to check that your server didn't miss any?

I'm open to the idea of wordsmithing here, of course, but I strongly
disagree that this is irrelevant data. There are plenty of
optimizations Postgres could theoretically implement but doesn't, so
measuring what should happen by what you think is obvious ("told it to
populate with default values - why do you need to check") is clearly
not valid.

This patch actually came out of our specifically needing to verify
that this is true before an op precisely because docs aren't actually
clear and because we can't risk a large table scan under an exclusive
lock. We're clearly not the only ones with that question; it came up
in a comment on this blog post announcing the newly committed feature
[1].

I realize that most users aren't as worried about this kind of
specific detail about DDL as we are (requiring absolutely zero slow
DDL while under an exclusive lock), but it is relevant to high uptime
systems.

Thanks,
James Coleman

1: https://www.depesz.com/2018/04/04/waiting-for-postgresql-11-fast-alter-table-add-column-with-a-non-null-default/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-01-20 01:27:59 Re: Replace uses of deprecated Python module distutils.sysconfig
Previous Message houzj.fnst@fujitsu.com 2022-01-20 01:12:48 RE: row filtering for logical replication