Re: pg_rewind WAL segments deletion pitfall

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: cyberdemn(at)gmail(dot)com
Cc: bungina(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: pg_rewind WAL segments deletion pitfall
Date: 2022-08-31 05:30:31
Message-ID: 20220831.143031.834123808244621694.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

At Tue, 30 Aug 2022 11:01:58 +0200, Alexander Kukushkin <cyberdemn(at)gmail(dot)com> wrote in
> On Tue, 30 Aug 2022 at 10:27, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
> wrote:
>
> >
> > Hmm. Doesn't it work to ignoring tli then? All segments that their
> > segment number is equal to or larger than the checkpoint locaiton are
> > preserved regardless of TLI?
> >
>
> If we ignore TLI there is a chance that we may retain some unnecessary (or
> just wrong) files.

Right. I mean I don't think thats a problem and we can rely on
postgres itself for later cleanup. Theoretically some out-of-range tli
or segno files are left alone but they surely will be gone soon after
the server starts.

> > > Also, we need to take into account the divergency LSN. Files after it are
> > > not required.
> >
> > They are removed at the later checkpoints. But also we can remove
> > segments that are out of the range between the last common checkpoint
> > and divergence point ignoring TLI.
>
>
> Everything that is newer last_common_checkpoint_seg could be removed (but
> it already happens automatically, because these files are missing on the
> new primary).
> WAL files that are older than last_common_checkpoint_seg could be either
> removed or at least not copied from the new primary.
..
> The current implementation relies on tracking WAL files being open while
> searching for the last common checkpoint. It automatically starts from the
> divergence_seg, automatically finishes at last_common_checkpoint_seg, and
> last but not least, automatically handles timeline changes. I don't think
> that manually written code that decides what to do from the WAL file name
> (and also takes into account TLI) could be much simpler than the current
> approach.

Yeah, I know. My expectation is taking the simplest way for the same
effect. My concern was the additional hash. On second thought, I
concluded that we should that on the existing filehash.

We can just add a FILE_ACTION_NONE entry to the file hash from
SimpleXLogPageRead. Since this happens before decide_file_action()
call, decide_file_action() should ignore the entries with
FILE_ACTION_NONE. Also we need to call filehash_init() earlier.

> Actually, since we start doing some additional "manipulations" with files
> in pg_wal, we probably should do a symmetric action with files inside
> pg_wal/archive_status

In that sense, pg_rewind rather should place missing
archive_status/*.done for segments including restored ones seen while
finding checkpoint. This is analogous of the behavior with
pg_basebackup and pg_receivewal. Also we should add FILE_ACTION_NONE
entries for .done files for segments read while finding checkpoint.

What do you think about that?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Kyotaro Horiguchi 2022-08-31 05:36:38 Re: pg_rewind WAL segments deletion pitfall
Previous Message Tom Lane 2022-08-30 19:01:31 Re: BUG #17602: Query backend process killed because it uses up all memory.

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2022-08-31 05:36:38 Re: pg_rewind WAL segments deletion pitfall
Previous Message Peter Geoghegan 2022-08-31 05:12:57 Re: New strategies for freezing, advancing relfrozenxid early