Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru>, vignesh C <vignesh21(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexander Lakhin <exclusion(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "tomas(at)vondra(dot)me" <tomas(at)vondra(dot)me>
Subject: Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly
Date: 2025-06-20 00:24:16
Message-ID: CAPpHfdvjGWo--xqqjJbyb_amdkhqamnzrwCZWe_hBD-rSTFbBg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Kuroda-san,

On Thu, Jun 19, 2025 at 2:05 PM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> > > Regarding assertion failure, I've found that assert in
> > > PhysicalConfirmReceivedLocation() conflicts with restart_lsn
> > > previously set by ReplicationSlotReserveWal(). As I can see,
> > > ReplicationSlotReserveWal() just picks fresh XLogCtl->RedoRecPtr lsn.
> > > So, it doesn't seems there is a guarantee that restart_lsn never goes
> > > backward. The commit in ReplicationSlotReserveWal() even states there
> > > is a "chance that we have to retry".
> > >
> >
> > I don't see how this theory can lead to a restart_lsn of a slot going
> > backwards. The retry mentioned there is just a retry to reserve the
> > slot's position again if the required WAL is already removed. Such a
> > retry can only get the position later than the previous restart_lsn.
>
> We analyzed the assertion failure happened at pg_basebackup/020_pg_receivewal,
> and confirmed that restart_lsn can go backward. This meant that Assert() added
> by the ca307d5 can cause crash.
>
> Background
> ===========
> When pg_receivewal starts the replication and it uses the replication slot, it
> sets as the beginning of the segment where restart_lsn exists, as the startpoint.
> E.g., if the restart_lsn of the slot is 0/B000D0, pg_receivewal requests WALs
> from 0/B00000.
> More detail of this behavior, see f61e1dd2 and d9bae531.
>
> What happened here
> ==================
> Based on above theory, walsender sent from the beginning of segment (0/B00000).
> When walreceiver receives, it tried to send reply. At that time the flushed
> location of WAL would be 0/B00000. walsender sets the received lsn as restart_lsn
> in PhysicalConfirmReceivedLocation(). Here the restart_lsn went backward (0/B000D0->0/B00000).
>
> The assertion failure could happen if CHECKPOINT happened at that time.
> Attribute last_saved_restart_lsn of the slot was 0/B000D0, but the data.restart_lsn
> was 0/B00000. It could not satisfy the assertion added in InvalidatePossiblyObsoleteSlot().

Thank you for your detailed explanation!

> Note
> ====
> 1.
> In this case, starting from the beginning of the segment is not a problem, because
> the checkpoint process only removes WAL files from segments that precede the
> restart_lsn's wal segment. The current segment (0/B00000) will not be removed,
> so there is no risk of data loss or inconsistency.
>
> 2.
> A similar pattern applies to pg_basebackup. Both use logic that adjusts the
> requested streaming position to the start of the segment, and it replies the
> received LSN as flushed.
>
> 3.
> I considered the theory above, but I could not reproduce 040_standby_failover_slots_sync
> because it is a timing issue. Have someone else reproduced?
>
> We are still investigating failure caused at 040_standby_failover_slots_sync.

I didn't manage to reproduce this. But as I see from the logs [1] on
mamba that START_REPLICATION command was issued just before assert
trap. Could it be something similar to what I described in [2].
Namely:
1. ReplicationSlotReserveWal() sets restart_lsn for the slot.
2. Concurrent checkpoint flushes that restart_lsn to the disk.
3. PhysicalConfirmReceivedLocation() sets restart_lsn of the slot to
the beginning of the segment.

[1] https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=mamba&dt=2025-06-17%2005%3A10%3A36&stg=recovery-check
[2] https://www.postgresql.org/message-id/CAPpHfdv3UEUBjsLhB_CwJT0xX9LmN6U%2B__myYopq4KcgvCSbTg%40mail.gmail.com

------
Regards,
Alexander Korotkov
Supabase

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2025-06-20 00:44:35 Re: Making Row Comparison NULL row member handling more robust during skip scans
Previous Message Alexander Korotkov 2025-06-20 00:18:20 Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly