Re: pg_rewind WAL segments deletion pitfall

From: torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, cyberdemn(at)gmail(dot)com
Cc: michael(at)paquier(dot)xyz, pgsql-hackers(at)postgresql(dot)org, bungina(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: pg_rewind WAL segments deletion pitfall
Date: 2023-08-29 13:15:51
Message-ID: bb706f8393c5207812268a5a2e3f98d7@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On 2023-08-24 09:45, Kyotaro Horiguchi wrote:
> At Wed, 23 Aug 2023 13:44:52 +0200, Alexander Kukushkin
> <cyberdemn(at)gmail(dot)com> wrote in
>> On Tue, 22 Aug 2023 at 07:32, Michael Paquier <michael(at)paquier(dot)xyz>
>> wrote:
>> > I don't like much this patch. While it takes correctly advantage of
>> > the backward record read logic from SimpleXLogPageRead() able to
>> > handle correctly timeline jumps, it creates a hidden dependency in the
>> > code between the hash table from filemap.c and the page callback.
>> > Wouldn't it be simpler to build a list of the segment names using the
>> > information from WALOpenSegment and build this list in
>> > findLastCheckpoint()?
>>
>> I think the first version of the patch more or less did that. Not
>> necessarily a list, but a hash table of WAL file names that we want to
>> keep. But Kyotaro Horiguchi didn't like it and suggested creating
>> entries
>> in the filemap.c hash table instead.
>> But, I agree, doing it directly from the findLastCheckpoint() makes
>> the
>> code easier to understand.
> ...
>> > + /*
>> > + * Some entries (WAL segments) already have an action assigned
>> > + * (see SimpleXLogPageRead()).
>> > + */
>> > + if (entry->action == FILE_ACTION_UNDECIDED)
>> > + entry->action = decide_file_action(entry);
>> >
>> > This change makes me a bit uneasy, per se my previous comment with the
>> > additional code dependencies.
>> >
>>
>> We can revert to the original approach (see
>> v1-0001-pg_rewind-wal-deletion.patch from the very first email) if you
>> like.
>
> On the other hand, that approach brings in another source that
> suggests the way that file should be handled. I still think that
> entry->action should be the only source.

+1.
Imaging a case when we come to need decide how to treat files based on
yet another factor, I feel that a single source of truth is better than
creating a list or hash for each factor.

> However, it seems I'm in the
> minority here. So I'm not tied to that approach.
>
>> > I think that this scenario deserves a test case. If one wants to
>> > emulate a delay in WAL archiving, it is possible to set
>> > archive_command to a command that we know will fail, for instance.
>> >
>>
>> Yes, I totally agree, it is on our radar, but meanwhile please see the
>> new
>> version, just to check if I correctly understood your idea.

Thanks for the patch.
I tested v4 patch using the script attached below thread and it has
successfully finished.

https://www.postgresql.org/message-id/2e75ae22dce9a227c3d47fa6d0ed094a%40oss.nttdata.com

--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2023-08-29 20:24:33 BUG #18075: configuration variable idle_session_timeout not working as expected
Previous Message PG Bug reporting form 2023-08-29 06:42:16 BUG #18074: After enabling JIT, the query runtime increased by over a thousand times.

Browse pgsql-hackers by date

  From Date Subject
Next Message Melanie Plageman 2023-08-29 13:21:37 Re: Eliminate redundant tuple visibility check in vacuum
Previous Message Daniel Gustafsson 2023-08-29 12:39:42 Re: Standardize spelling of "power of two"