Re: Race between KeepFileRestoredFromArchive() and restartpoint

From: David Steele <david(at)pgmasters(dot)net>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Don Seiler <don(at)seiler(dot)us>
Subject: Re: Race between KeepFileRestoredFromArchive() and restartpoint
Date: 2022-08-02 15:01:19
Message-ID: da613671-86b1-8731-8cc0-d5bc1590d6f8@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 8/2/22 10:37, Noah Misch wrote:
> On Tue, Aug 02, 2022 at 10:14:22AM -0400, David Steele wrote:
>> On 7/31/22 02:17, Noah Misch wrote:
>>> On Tue, Jul 26, 2022 at 07:21:29AM -0400, David Steele wrote:
>>>> On 6/19/21 16:39, Noah Misch wrote:
>>>>> On Tue, Feb 02, 2021 at 07:14:16AM -0800, Noah Misch wrote:
>>>>>> Recycling and preallocation are wasteful during archive recovery, because
>>>>>> KeepFileRestoredFromArchive() unlinks every entry in its path. I propose to
>>>>>> fix the race by adding an XLogCtl flag indicating which regime currently owns
>>>>>> the right to add long-term pg_wal directory entries. In the archive recovery
>>>>>> regime, the checkpointer will not preallocate and will unlink old segments
>>>>>> instead of recycling them (like wal_recycle=off). XLogFileInit() will fail.
>>>>>
>>>>> Here's the implementation. Patches 1-4 suffice to stop the user-visible
>>>>> ERROR. Patch 5 avoids a spurious LOG-level message and wasted filesystem
>>>>> writes, and it provides some future-proofing.
>>>>>
>>>>> I was tempted to (but did not) just remove preallocation. Creating one file
>>>>> per checkpoint seems tiny relative to the max_wal_size=1GB default, so I
>>>>> expect it's hard to isolate any benefit. Under the old checkpoint_segments=3
>>>>> default, a preallocated segment covered a respectable third of the next
>>>>> checkpoint. Before commit 63653f7 (2002), preallocation created more files.
>>>>
>>>> This also seems like it would fix the link issues we are seeing in [1].
>>>>
>>>> I wonder if that would make it worth a back patch?
>>>
>>> Perhaps. It's sad to have multiple people deep-diving into something fixed on
>>> HEAD. On the other hand, I'm not eager to spend risk-of-backpatch points on
>>> this. One alternative would be adding an errhint like "This is known to
>>> happen occasionally during archive recovery, where it is harmless." That has
>>> an unpolished look, but it's low-risk and may avoid deep-dive efforts.
>>
>> I think in this case a HINT might be sufficient to at least keep people from
>> wasting time tracking down a problem that has already been fixed.
>>
>> However, there is another issue [1] that might argue for a back patch if
>> this patch (as I believe) would fix the issue.
>
>> [1] https://www.postgresql.org/message-id/CAHJZqBDxWfcd53jm0bFttuqpK3jV2YKWx%3D4W7KxNB4zzt%2B%2BqFg%40mail.gmail.com
>
> That makes sense. Each iteration of the restartpoint recycle loop has a 1/N
> chance of failing. Recovery adds >N files between restartpoints. Hence, the
> WAL directory grows without bound. Is that roughly the theory in mind?

Yes, though you have formulated it better than I had in my mind.

Let's see if Don can confirm that he is seeing the "could not link file"
messages.

Regards,
-David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2022-08-02 15:18:44 Re: Slow standby snapshot
Previous Message Andres Freund 2022-08-02 15:00:22 Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size