Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Daniel Gustafsson <daniel(at)yesql(dot)se>, "Anton A(dot) Melnikov" <aamelnikov(at)inbox(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"
Date: 2023-07-24 20:17:56
Message-ID: CA+TgmoZsOzpiH9Zc4LOnuON1GXM+x830g-jCbyhp-ZXeUo_0kg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 21, 2023 at 8:52 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> Idea for future research: Perhaps pg_backup_stop()'s label-file
> output should include the control file image (suitably encoded)? Then
> the recovery-from-label code could completely ignore the existing
> control file, and overwrite it using that copy. It's already
> partially ignoring it, by using the label file's checkpoint LSN
> instead of the control file's. Perhaps the captured copy could
> include the correct LSN already, simplifying that code, and the low
> level backup procedure would not need any additional steps or caveats.
> No more atomicity problem for low-level-backups... but probably not
> something we would back-patch, for such a rare failure mode.

I don't really know what the solution is, but this is a general
problem with the low-level backup API, and I think it sucks pretty
hard. Here, we're talking about the control file, but the same problem
exists with the data files. We try to work around that but it's all
hacks. Unless your backup tool has special magic powers of some kind,
you can't take a backup using either pg_basebackup or the low-level
API and then check that individual blocks have valid checksums, or
that they have sensible, interpretable contents, because they might
not. (Yeah, I know we have code to verify checksums during a base
backup, but as discussed elsewhere, it doesn't work.) It's also why we
have to force full-page write on during a backup. But the whole thing
is nasty because you can't really verify anything about the backup you
just took. It may be full of gibberish blocks but don't worry because,
if all goes well, recovery will fix it. But you won't really know
whether recovery actually does fix it. You just kind of have to cross
your fingers and hope.

It's unclear to me how we could do better, especially when using the
low-level API. BASE_BACKUP could read via shared_buffers instead of
the FS, and I think that might be a good idea if we can defend
adequately against cache poisoning, but with the low-level API someone
may just be calling a FS-level snapshot primitive. Unless we're
prepared to pause all writes while that happens, I don't know how to
do better.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ranier Vilela 2023-07-24 22:04:27 Re: Avoid unused value (src/fe_utils/print.c)
Previous Message Greg Sabino Mullane 2023-07-24 20:09:23 Improve pg_stat_statements by making jumble handle savepoint names better