Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: "Anton A(dot) Melnikov" <aamelnikov(at)inbox(dot)ru>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"
Date: 2023-02-22 00:10:35
Message-ID: CA+hUKGJqWyqH08WDe=XvW_Aka3Vf7379vC1MfGLeGfizfKR2MA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 17, 2023 at 4:21 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> While contemplating what else a mandatory file lock might break, I
> remembered that basebackup.c also reads the control file. Hrmph. Not
> addressed yet; I guess it might need to acquire/release around
> sendFile(sink, XLOG_CONTROL_FILE, ...)?

If we go this way, I suppose, in theory at least, someone with
external pg_backup_start()-based tools might also want to hold the
lock while copying pg_control. Otherwise they might fail to open it
on Windows (where that patch uses a mandatory lock) or copy garbage on
Linux (as they can today, I assume), with non-zero probability -- at
least when copying files from a hot standby. Or backup tools might
want to get the file contents through some entirely different
mechanism that does the right interlocking (whatever that might be,
maybe inside the server). Perhaps this is not so much the localised
systems programming curiosity I thought it was, and has implications
that'd need to be part of the documented low-level backup steps. It
makes me like the idea a bit less. It'd be good to hear from backup
gurus what they think about that.

One cute hack I thought of to make the file lock effectively advisory
on Windows is to lock a byte range *past the end* of the file (the
documentation says you can do that). That shouldn't stop programs
that want to read the file without locking and don't know/care about
our scheme (that is, pre-existing backup tools that haven't considered
this problem and remain oblivious or accept the (low) risk of torn
reads), but will block other participants in our scheme.

If we went back to the keep-rereading-until-it-stops-changing model,
then an external backup tool would need to be prepared to do that too,
in theory at least. Maybe some already do something like that?

Or maybe the problem is/was too theoretical before...

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2023-02-22 00:37:16 Re: Fix the description of GUC "max_locks_per_transaction" and "max_pred_locks_per_transaction" in guc_table.c
Previous Message Tom Lane 2023-02-21 23:50:06 Re: Possible false valgrind error reports