RE: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru>, vignesh C <vignesh21(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexander Lakhin <exclusion(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "tomas(at)vondra(dot)me" <tomas(at)vondra(dot)me>
Subject: RE: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly
Date: 2025-06-19 11:05:49
Message-ID: OSCPR01MB1496646E4CF72B9231DB711AFF57DA@OSCPR01MB14966.jpnprd01.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Amit, Alexander,

> > Regarding assertion failure, I've found that assert in
> > PhysicalConfirmReceivedLocation() conflicts with restart_lsn
> > previously set by ReplicationSlotReserveWal(). As I can see,
> > ReplicationSlotReserveWal() just picks fresh XLogCtl->RedoRecPtr lsn.
> > So, it doesn't seems there is a guarantee that restart_lsn never goes
> > backward. The commit in ReplicationSlotReserveWal() even states there
> > is a "chance that we have to retry".
> >
>
> I don't see how this theory can lead to a restart_lsn of a slot going
> backwards. The retry mentioned there is just a retry to reserve the
> slot's position again if the required WAL is already removed. Such a
> retry can only get the position later than the previous restart_lsn.

We analyzed the assertion failure happened at pg_basebackup/020_pg_receivewal,
and confirmed that restart_lsn can go backward. This meant that Assert() added
by the ca307d5 can cause crash.

Background
===========
When pg_receivewal starts the replication and it uses the replication slot, it
sets as the beginning of the segment where restart_lsn exists, as the startpoint.
E.g., if the restart_lsn of the slot is 0/B000D0, pg_receivewal requests WALs
from 0/B00000.
More detail of this behavior, see f61e1dd2 and d9bae531.

What happened here
==================
Based on above theory, walsender sent from the beginning of segment (0/B00000).
When walreceiver receives, it tried to send reply. At that time the flushed
location of WAL would be 0/B00000. walsender sets the received lsn as restart_lsn
in PhysicalConfirmReceivedLocation(). Here the restart_lsn went backward (0/B000D0->0/B00000).

The assertion failure could happen if CHECKPOINT happened at that time.
Attribute last_saved_restart_lsn of the slot was 0/B000D0, but the data.restart_lsn
was 0/B00000. It could not satisfy the assertion added in InvalidatePossiblyObsoleteSlot().

Note
====
1.
In this case, starting from the beginning of the segment is not a problem, because
the checkpoint process only removes WAL files from segments that precede the
restart_lsn's wal segment. The current segment (0/B00000) will not be removed,
so there is no risk of data loss or inconsistency.

2.
A similar pattern applies to pg_basebackup. Both use logic that adjusts the
requested streaming position to the start of the segment, and it replies the
received LSN as flushed.

3.
I considered the theory above, but I could not reproduce 040_standby_failover_slots_sync
because it is a timing issue. Have someone else reproduced?

We are still investigating failure caused at 040_standby_failover_slots_sync.

[1]: https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=scorpion&dt=2025-06-17%2000%3A40%3A46&stg=pg_basebackup-check

Best regards,
Hayato Kuroda
FUJITSU LIMITED

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2025-06-19 11:50:44 Re: BackendKeyData is mandatory?
Previous Message shveta malik 2025-06-19 11:04:38 Re: Conflict detection for update_deleted in logical replication