Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Daniel Gustafsson <daniel(at)yesql(dot)se>, "Anton A(dot) Melnikov" <aamelnikov(at)inbox(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, David Steele <david(at)pgmasters(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"
Date: 2023-07-26 04:06:31
Message-ID: CA+hUKGLnhvNW9+V7+o_8Cnu+_fp1kZ7UU9PDnwyqFKQi4agAnA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

While chatting to Robert and Andres about all this, a new idea came
up. Or, rather, one of the first ideas that was initially rejected,
now resurrected to try out a suggestion of Andres’s on how to
de-pessimise it. Unfortunately, it also suffers from Windows-specific
problems that I originally mentioned at the top of this thread but
had since repressed. Arrrghgh.

First, the good news:

We could write out a whole new control file, and durable_rename() it
into place. We don’t want to do that in general, because we don’t
want to slow down UpdateMinRecoveryPoint(). The new concept is to do
that only if a backup is in progress. That requires a bit of
interlocking with backup start/stop (ie when runningBackups is
changing in shmem, we don’t want to overlap with UpdateControlFile()'s
decision on how to do it). Here is a patch to try that out. No more
weasel wording needed for the docs; basebackup and low-level file
system backup should always see an atomic control file (and
occasionally also copy a harmless pg_control.tmp file). Then we only
need the gross retry-until-stable hack for front-end programs.

And the bad news:

In my catalogue-of-Windows-weirdness project[1], I learned in v3-0003 that:

+ fd = open(path, O_CREAT | O_EXCL | O_RDWR | PG_BINARY, 0777);
+ PG_EXPECT_SYS(fd >= 0, "touch name1.txt");
+ PG_REQUIRE_SYS(fd < 0 || close(fd) == 0);
+
+ fd = open(path2, O_RDWR | PG_BINARY, 0777);
+ PG_EXPECT_SYS(fd >= 0, "open name2.txt");
+ make_path(path2, "name2.txt");
+#ifdef WIN32
+
+ /*
+ * Windows can't rename over an open non-unlinked file, even with
+ * have_posix_unlink_semantics.
+ */
+ pgwin32_dirmod_loops = 2; /* minimize looping to fail fast in testing */
+ PG_EXPECT_SYS(rename(path, path2) == -1,
+ "Windows: can't rename name1.txt -> name2.txt while name2.txt is open");
+ PG_EXPECT_EQ(errno, EACCES);
+ PG_EXPECT_SYS(unlink(path) == 0, "unlink name1.txt");
+#else
+ PG_EXPECT_SYS(rename(path, path2) == 0,
+ "POSIX: can rename name1.txt -> name2.txt while name2.txt is open");
+#endif
+ PG_EXPECT_SYS(close(fd) == 0);

Luckily the code in dirmod.c:pgrename() should retry lots of times if
a concurrent transient opener/reader comes along, so I think that
should be OK in practice (but if backups_r_us.exe holds the file open
for 10 seconds while we're trying to rename it, I assume we'll PANIC);
call that problem #1. What is slightly more disturbing is the clue in
the "Cygwin cleanup" thread[2] that rename() can fail to be 100%
atomic, so that a concurrent call to open() can fail with ENOENT (cf.
the POSIX requirement "... a link named new shall remain visible to
other processes throughout the renaming operation and refer either to
the file referred to by new or old ..."). Call that problem #2, a
problem that already causes us rare breakage (for example: could not
open file "pg_logical/snapshots/0-14FE6B0.snap").

I know that problem #1 can be fixed by applying v3-0004 from [1] but
that leads to impossible decisions like revoking support for non-NTFS
filesystems as discussed in that thread, and we certainly couldn't
back-patch that anyway. I assume problem #2 can too.

That makes me want to *also* acquire ControlFileLock, for base
backup's read of pg_control. Even though it seems redundant with the
rename() trick (the rename() trick should be enough for low-level
*and* basebackup on ext4), it would at least avoid the above
Windowsian pathologies during base backups.

I'm sorry for the patch/idea-churn in this thread. It's like
Whac-a-Mole. Blasted non-POSIX-compliant moles. New patches
attached. Are they getting better?

[1] https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BajSQ_8eu2AogTncOnZ5me2D-Cn66iN_-wZnRjLN%2Bicg%40mail.gmail.com
[2] https://www.postgresql.org/message-id/flat/CA%2BhUKG%2Be13wK0PBX5Z63CCwWm7MfRQuwBRabM_3aKWSko2AUww%40mail.gmail.com

Attachment Content-Type Size
v6-0001-Update-control-file-atomically-during-backups.patch application/x-patch 8.9 KB
v6-0002-Try-to-tolerate-torn-reads-of-control-file-in-fro.patch application/x-patch 2.5 KB
v6-0003-Acquire-ControlFileLock-in-base-backups.patch application/x-patch 4.7 KB
v6-0004-Acquire-ControlFileLock-in-SQL-functions.patch application/x-patch 2.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2023-07-26 04:07:39 Re: logical decoding and replication of sequences, take 2
Previous Message Michael Paquier 2023-07-26 03:52:20 Re: Support worker_spi to execute the function dynamically.