Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"

From: David Steele <david(at)pgmasters(dot)net>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Daniel Gustafsson <daniel(at)yesql(dot)se>, "Anton A(dot) Melnikov" <aamelnikov(at)inbox(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"
Date: 2023-07-27 08:18:47
Message-ID: 7edf2ce2-e744-f4d2-21d4-3a0df3b836b0@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Thomas,

On 7/26/23 06:06, Thomas Munro wrote:
> While chatting to Robert and Andres about all this, a new idea came
> up. Or, rather, one of the first ideas that was initially rejected,
> now resurrected to try out a suggestion of Andres’s on how to
> de-pessimise it. Unfortunately, it also suffers from Windows-specific
> problems that I originally mentioned at the top of this thread but
> had since repressed. Arrrghgh.
>
> First, the good news:
>
> We could write out a whole new control file, and durable_rename() it
> into place. We don’t want to do that in general, because we don’t
> want to slow down UpdateMinRecoveryPoint(). The new concept is to do
> that only if a backup is in progress. That requires a bit of
> interlocking with backup start/stop (ie when runningBackups is
> changing in shmem, we don’t want to overlap with UpdateControlFile()'s
> decision on how to do it). Here is a patch to try that out. No more
> weasel wording needed for the docs; basebackup and low-level file
> system backup should always see an atomic control file (and
> occasionally also copy a harmless pg_control.tmp file). Then we only
> need the gross retry-until-stable hack for front-end programs.

I like the approach in these patches better than the last patch set. My
only concern would be possible performance regression on standbys (when
doing backup from standby) since pg_control can be written very
frequently to update min recovery point.

I've made a first pass through the patches and they look generally
reasonable (and back patch-able).

One thing:

+ sendFileWithContent(sink, XLOG_CONTROL_FILE,
+ (char *) control_file, sizeof(*control_file),
+ &manifest);

I wonder if we should pad pg_control out to 8k so it remains the same
size as now? Postgres doesn't care, but might look odd to users, and is
arguably a change in behavior that should not be back patched.

> And the bad news:

Provided we can reasonably address the Windows issues this seems to be
the way to go.

Regards,
-David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pierre Ducroquet 2023-07-27 08:51:11 Improvements in pg_dump/pg_restore toc format and performances
Previous Message Andrey Lepikhov 2023-07-27 07:58:16 Re: [PoC] Reducing planning time when tables have many partitions