Race between KeepFileRestoredFromArchive() and restartpoint

From: Noah Misch <noah(at)leadboat(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Race between KeepFileRestoredFromArchive() and restartpoint
Date: 2021-02-02 15:14:16
Message-ID: 20210202151416.GB3304930@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

A race with KeepFileRestoredFromArchive() can cause a restartpoint to fail, as
seen once on the buildfarm[1]. The attached patch adds a test case; it
applies atop the "stop events" patch[2]. We have two systems for adding
long-term pg_wal directory entries. KeepFileRestoredFromArchive() adds them
during archive recovery, while InstallXLogFileSegment() does so at all times.
Unfortunately, InstallXLogFileSegment() happens throughout archive recovery,
via the checkpointer recycling segments and calling PreallocXlogFiles().
Multiple processes can run InstallXLogFileSegment(), which uses
ControlFileLock to represent the authority to modify the directory listing of
pg_wal. KeepFileRestoredFromArchive() just assumes it controls pg_wal.

Recycling and preallocation are wasteful during archive recovery, because
KeepFileRestoredFromArchive() unlinks every entry in its path. I propose to
fix the race by adding an XLogCtl flag indicating which regime currently owns
the right to add long-term pg_wal directory entries. In the archive recovery
regime, the checkpointer will not preallocate and will unlink old segments
instead of recycling them (like wal_recycle=off). XLogFileInit() will fail.

Notable alternatives:

- Release ControlFileLock at the end of XLogFileInit(), not at the end of
InstallXLogFileSegment(). Add ControlFileLock acquisition to
KeepFileRestoredFromArchive(). This provides adequate mutual exclusion, but
XLogFileInit() could return a file descriptor for an unlinked file. That's
fine for PreallocXlogFiles(), but it feels wrong.

- During restartpoints, never preallocate or recycle segments. (Just delete
obsolete WAL.) By denying those benefits, this presumably makes streaming
recovery less efficient.

- Make KeepFileRestoredFromArchive() call XLogFileInit() to open a segment,
then copy bytes. This is simple, but it multiplies I/O. That might be
negligible on account of caching, or it might not be. A variant, incurring
extra fsyncs, would be to use durable_rename() to replace the segment we get
from XLogFileInit().

- Make KeepFileRestoredFromArchive() rename without first unlinking. This
avoids checkpoint failure, but a race could trigger noise from the LOG
message in InstallXLogFileSegment -> durable_rename_excl.

Does anyone prefer some alternative? It's probably not worth back-patching
anything for a restartpoint failure this rare, because most restartpoint
outcomes are not user-visible.

Thanks,
nm

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2020-10-05%2023%3A02%3A17
[2] https://postgr.es/m/CAPpHfdtSEOHX8dSk9Qp%2BZ%2B%2Bi4BGQoffKip6JDWngEA%2Bg7Z-XmQ%40mail.gmail.com

Attachment Content-Type Size
repro-KeepFileRestoredFromArchive-v0.patch text/plain 10.0 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-02-02 15:17:34 Re: Add primary keys to system catalogs
Previous Message Julien Rouhaud 2021-02-02 14:54:30 Re: Add primary keys to system catalogs