Quick Links

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, tomas(at)vondra(dot)me
Subject:	Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly
Date:	2025-06-10 20:14:16
Message-ID:	CAPpHfdt77k3BqD=anh4b7UiRevfPOu2Hcf3rvSZXntxH+4G=Ug@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, Jun 9, 2025 at 7:09 PM Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru> wrote:
> > I think we can use this approach for HEAD and probably keep the
> > previous idea for backbranches. Keeping some value in shared_memory
> > per slot sounds risky to me in terms of introducing new bugs.
>
> Not sure, what kind of problems may occur. I propose to allocate in shmem an
> array of last_saved_restart_lsn like below which is not a part of the public
> api (see below). It will be allocated and deallocated in shmem the same way as
> ReplicationSlotCtlData. I can prepare a patch, if needed.
>
> typedef struct ReplicationSlotCtlDataExt {
> XLogRecPtr last_saved_restart_lsn[1];
> } ReplicationSlotCtlDataExt;

This could work, but I think this is not a solution for HEAD anyway.
In the HEAD, it would be better to keep everything inside the
ReplicationSlot struct. In the same time, I don't like idea to have
different shared memory structs between branches if we can avoid that.

> > Yeah, but with physical slots, it is possible that the slot's xmin
> > value is pointing to some value, say 700 (after restart), but vacuum
> > would have removed tuples from transaction IDs greater than 700 as
> > explained in email [1].
>
> I think, we have no xmin problem for physical slots. The xmin values of
> physical slots are used to process HSF messages. If I correctly understood what
> you mean, you are telling about the problem which is solved by hot standby
> feedback messages. This message is used to disable tuples vacuuming on the
> primary to avoid delete conflicts on the replica in queries (some queries may
> select some tuples which were vacuumed on the primary and deletions are
> replicated to the standby). If the primary receives a HSF message after slot
> saving, I believe, it is allowable if autovacuum cleans tuples with xmin later
> than the last saved value. If the primary restarts, the older value will be
> loaded but the replica already confirmed the newer value. Concerning replica,
> it is the obligation of the replica to send such HSF xmin that will survive
> replica's immediate restart.

> >> Taking into account these thoughts, I can't see any problems with the alternative
> >> patch where oldest wal lsn is calculated only in checkpoint.
> >>
> >The alternative will needlessly prevent removing WAL segments in some
> >cases when logical slots are in use.
>
> IMHO, I'm not sure, it will significantly impact the wal removal. We remove WAL
> segments only in checkpoint. The alternate solution gets the oldest WAL segment
> at the beginning of checkpoint, then saves dirty slots to disk, and removes old
> WAL segments at the end of checkpoint using the oldest WAL segment obtained at
> the beginning of checkpoint. The alternate solution may not be so effective
> in terms of WAL segments removal, if a logical slot is advanced during
> checkpoint, but I do not think it is a significant issue. From the other hand,
> the alternate solution simplifies the logic of WAL removal, backward compatible
> (avoids addition new in-memory states), decreases the number of locks in
> ReplicationSlotsComputeRequiredLSN - no need to recalculate oldest slots'
> restart lsn every time when a slot is advanced.

So, my proposal is to commit the attached patchset to the HEAD, and
commit [1] to the back branches. Any objections?

Links.
1. https://www.postgresql.org/message-id/CAPpHfdutKQxpm-gJgiZRb2ouKC9%2BHZx3fG3F00zd%3DxdxDidm_g%40mail.gmail.com

------
Regards,
Alexander Korotkov
Supabase

Attachment	Content-Type	Size
v3-0001-Keep-WAL-segments-by-slot-s-flushed-restart-LSN.patch	application/octet-stream	7.0 KB
v3-0002-Add-TAP-tests-to-check-replication-slot-advance-d.patch	application/octet-stream	13.7 KB

In response to

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly at 2025-06-09 16:09:58 from Vitaly Davydov

Responses

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly at 2025-06-12 11:24:36 from Amit Kapila

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Nathan Bossart	2025-06-10 20:21:34	Re: add function for creating/attaching hash table in DSM registry
Previous Message	Andres Freund	2025-06-10 20:09:53	Re: Non-reproducible AIO failure