From: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
---|---|
To: | 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
Cc: | Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru>, vignesh C <vignesh21(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexander Lakhin <exclusion(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "tomas(at)vondra(dot)me" <tomas(at)vondra(dot)me> |
Subject: | RE: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly |
Date: | 2025-06-19 11:05:49 |
Message-ID: | OSCPR01MB1496646E4CF72B9231DB711AFF57DA@OSCPR01MB14966.jpnprd01.prod.outlook.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Dear Amit, Alexander,
> > Regarding assertion failure, I've found that assert in
> > PhysicalConfirmReceivedLocation() conflicts with restart_lsn
> > previously set by ReplicationSlotReserveWal(). As I can see,
> > ReplicationSlotReserveWal() just picks fresh XLogCtl->RedoRecPtr lsn.
> > So, it doesn't seems there is a guarantee that restart_lsn never goes
> > backward. The commit in ReplicationSlotReserveWal() even states there
> > is a "chance that we have to retry".
> >
>
> I don't see how this theory can lead to a restart_lsn of a slot going
> backwards. The retry mentioned there is just a retry to reserve the
> slot's position again if the required WAL is already removed. Such a
> retry can only get the position later than the previous restart_lsn.
We analyzed the assertion failure happened at pg_basebackup/020_pg_receivewal,
and confirmed that restart_lsn can go backward. This meant that Assert() added
by the ca307d5 can cause crash.
Background
===========
When pg_receivewal starts the replication and it uses the replication slot, it
sets as the beginning of the segment where restart_lsn exists, as the startpoint.
E.g., if the restart_lsn of the slot is 0/B000D0, pg_receivewal requests WALs
from 0/B00000.
More detail of this behavior, see f61e1dd2 and d9bae531.
What happened here
==================
Based on above theory, walsender sent from the beginning of segment (0/B00000).
When walreceiver receives, it tried to send reply. At that time the flushed
location of WAL would be 0/B00000. walsender sets the received lsn as restart_lsn
in PhysicalConfirmReceivedLocation(). Here the restart_lsn went backward (0/B000D0->0/B00000).
The assertion failure could happen if CHECKPOINT happened at that time.
Attribute last_saved_restart_lsn of the slot was 0/B000D0, but the data.restart_lsn
was 0/B00000. It could not satisfy the assertion added in InvalidatePossiblyObsoleteSlot().
Note
====
1.
In this case, starting from the beginning of the segment is not a problem, because
the checkpoint process only removes WAL files from segments that precede the
restart_lsn's wal segment. The current segment (0/B00000) will not be removed,
so there is no risk of data loss or inconsistency.
2.
A similar pattern applies to pg_basebackup. Both use logic that adjusts the
requested streaming position to the start of the segment, and it replies the
received LSN as flushed.
3.
I considered the theory above, but I could not reproduce 040_standby_failover_slots_sync
because it is a timing issue. Have someone else reproduced?
We are still investigating failure caused at 040_standby_failover_slots_sync.
Best regards,
Hayato Kuroda
FUJITSU LIMITED
From | Date | Subject | |
---|---|---|---|
Next Message | Tatsuo Ishii | 2025-06-19 11:50:44 | Re: BackendKeyData is mandatory? |
Previous Message | shveta malik | 2025-06-19 11:04:38 | Re: Conflict detection for update_deleted in logical replication |