Re: pg_rewind WAL segments deletion pitfall

From: Polina Bungina <bungina(at)gmail(dot)com>
To: horikyota(dot)ntt(at)gmail(dot)com
Cc: cyberdemn(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: pg_rewind WAL segments deletion pitfall
Date: 2022-09-01 11:33:09
Message-ID: CAAtGL4A1UN7gxWjsmm-PxT0qmfQc8WQRT+Ag-YBUO-98pN9x4Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Hello Kayotaro,

Here is the new version of the patch that includes the changes you
suggested. It is smaller now but I doubt if it is as easy to understand as
it used to be.

The need of manipulations with the target’s pg_wal/archive_status directory
is a question to discuss…

At first glance it seems to be useless for .ready files: checkpointer
process will anyway recreate them if archiving is enabled on the rewound
old primary and we will finally have them in the archive. As for the .done
files, it seems reasonable to follow the pg_basebackup logic and keep .done
files together with the corresponding segments (those between the last
common checkpoint and the point of divergence) to protect them from being
archived once again.

But on the other hand it seems to be not that straightforward: imaging we
have WAL segment X on the target along with X.done file and we decide to
preserve them both (or we download it from archive and force .done file
creation), while archive_mode was set to ‘always’ and the source (promoted
replica) also still has WAL segment X and X.ready file. After pg_rewind we
will end up with both X.ready and X.done, which seems to be not a good
situation (but most likely not critical either).

Regards,

Polina Bungina

On Wed, Aug 31, 2022 at 7:30 AM Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
wrote:

> At Tue, 30 Aug 2022 11:01:58 +0200, Alexander Kukushkin <
> cyberdemn(at)gmail(dot)com> wrote in
> > On Tue, 30 Aug 2022 at 10:27, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com
> >
> > wrote:
> >
> > >
> > > Hmm. Doesn't it work to ignoring tli then? All segments that their
> > > segment number is equal to or larger than the checkpoint locaiton are
> > > preserved regardless of TLI?
> > >
> >
> > If we ignore TLI there is a chance that we may retain some unnecessary
> (or
> > just wrong) files.
>
> Right. I mean I don't think thats a problem and we can rely on
> postgres itself for later cleanup. Theoretically some out-of-range tli
> or segno files are left alone but they surely will be gone soon after
> the server starts.
>
> > > > Also, we need to take into account the divergency LSN. Files after
> it are
> > > > not required.
> > >
> > > They are removed at the later checkpoints. But also we can remove
> > > segments that are out of the range between the last common checkpoint
> > > and divergence point ignoring TLI.
> >
> >
> > Everything that is newer last_common_checkpoint_seg could be removed (but
> > it already happens automatically, because these files are missing on the
> > new primary).
> > WAL files that are older than last_common_checkpoint_seg could be either
> > removed or at least not copied from the new primary.
> ..
> > The current implementation relies on tracking WAL files being open while
> > searching for the last common checkpoint. It automatically starts from
> the
> > divergence_seg, automatically finishes at last_common_checkpoint_seg, and
> > last but not least, automatically handles timeline changes. I don't think
> > that manually written code that decides what to do from the WAL file name
> > (and also takes into account TLI) could be much simpler than the current
> > approach.
>
> Yeah, I know. My expectation is taking the simplest way for the same
> effect. My concern was the additional hash. On second thought, I
> concluded that we should that on the existing filehash.
>
> We can just add a FILE_ACTION_NONE entry to the file hash from
> SimpleXLogPageRead. Since this happens before decide_file_action()
> call, decide_file_action() should ignore the entries with
> FILE_ACTION_NONE. Also we need to call filehash_init() earlier.
>
> > Actually, since we start doing some additional "manipulations" with files
> > in pg_wal, we probably should do a symmetric action with files inside
> > pg_wal/archive_status
>
> In that sense, pg_rewind rather should place missing
> archive_status/*.done for segments including restored ones seen while
> finding checkpoint. This is analogous of the behavior with
> pg_basebackup and pg_receivewal. Also we should add FILE_ACTION_NONE
> entries for .done files for segments read while finding checkpoint.
>
> What do you think about that?
>
> regards.
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>

Attachment Content-Type Size
v2-0001-pg_rewind-wal-deletion.patch application/octet-stream 5.3 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Polina Bungina 2022-09-01 11:58:04 Re: pg_rewind WAL segments deletion pitfall
Previous Message Noah Misch 2022-09-01 04:39:13 Re: fetching bytea (blob) data of 850 MB from psql client failed

Browse pgsql-hackers by date

  From Date Subject
Next Message Ranier Vilela 2022-09-01 11:42:15 Re: Avoid overhead open-close indexes (catalog updates)
Previous Message houzj.fnst@fujitsu.com 2022-09-01 11:23:22 RE: Perform streaming logical transactions by background workers and parallel apply