Re: Document atthasmissing default optimization avoids verification table scan

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, James Coleman <jtc331(at)gmail(dot)com>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Document atthasmissing default optimization avoids verification table scan
Date: 2022-01-21 22:29:01
Message-ID: CAKFQuwZ7OzjvtEx_iBm1xhuzi5CzDOH9L=EoiZusbqtV7xcbKQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 21, 2022 at 2:50 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> "David G. Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com> writes:
> > On Fri, Jan 21, 2022 at 2:08 PM Andrew Dunstan <andrew(at)dunslane(dot)net>
> wrote:
> >> I know what it's replacing refers to release 11, but let's stop doing
> >> that. How about something like this?
> >>
> >> Adding a new column can sometimes require rewriting the table,
> >> making it a very slow operation. However in many cases this rewrite
> >> and related verification scans can be optimized away by using an
> >> appropriate default value. See the notes in <command>ALTER
> >> TABLE</command> for details.
>
> > I think it is a virtue, and am supported in that feeling by the existing
> > wording, to be explicit about the release before which these
> optimizations
> > can not happen. The docs generally use this to good effect without
> > overdoing it. This is a prime example.
>
> The fact of the matter is that optimizations of this sort have existed
> for years. (For example, I think we've optimized away the rewrite
> when the new column is DEFAULT NULL since the very beginning.) So it
> does not help to write the text as if there were no such optimizations
> before version N and they were all there in N.
>

Fair point, and indeed the v10 docs do mention the NULL (or no default)
optimization.

> I agree that Andrew's text could stand a pass of "omit needless words".
> But I also think that we could be a bit more explicit about what "slow"
> means. Maybe like
>
> Adding a new column can require rewriting the whole table,
> making it slow for large tables. However the rewrite can be optimized
> away in some cases, depending on what default value is given to the
> column. See <command>ALTER TABLE</command> for details.
>
>
Comma needed after however.
You've removed the "constraint verification scan" portion of this. Maybe:
"""
...
column. The same applies for the NOT NULL constraint verification scan.
See <command>ALTER TABLE</command> for details.
"""

Re-reading this, the recommendation:

- However, if the default value is volatile (e.g.,
- <function>clock_timestamp()</function>)
- each row will need to be updated with the value calculated at the time
- <command>ALTER TABLE</command> is executed. To avoid a potentially
- lengthy update operation, particularly if you intend to fill the
column
- with mostly nondefault values anyway, it may be preferable to add the
- column with no default, insert the correct values using
- <command>UPDATE</command>, and then add any desired default as
described
- below.

has now been completely removed from the documentation. I suggest having
this remain as the Tip and turning the optimization stuff into a Note.

> (the ALTER TABLE reference should be a link, too)
>

Yeah, the page does have a link already (fairly close by...) but with these
changes putting one here seems to make sense.

David J.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2022-01-21 22:35:16 Re: fairywren is generating bogus BASE_BACKUP commands
Previous Message Robert Haas 2022-01-21 22:26:32 Re: fairywren is generating bogus BASE_BACKUP commands