Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, tomas(at)vondra(dot)me
Subject: Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly
Date: 2025-06-12 11:24:36
Message-ID: CAA4eK1J4Vv6iQRqGzAZwM+iKKaNfgEV3cYs85PrbbghCAty-Og@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 11, 2025 at 1:44 AM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
>
> On Mon, Jun 9, 2025 at 7:09 PM Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru> wrote:
> > > I think we can use this approach for HEAD and probably keep the
> > > previous idea for backbranches. Keeping some value in shared_memory
> > > per slot sounds risky to me in terms of introducing new bugs.
> >
> > Not sure, what kind of problems may occur. I propose to allocate in shmem an
> > array of last_saved_restart_lsn like below which is not a part of the public
> > api (see below). It will be allocated and deallocated in shmem the same way as
> > ReplicationSlotCtlData. I can prepare a patch, if needed.
> >
> > typedef struct ReplicationSlotCtlDataExt {
> > XLogRecPtr last_saved_restart_lsn[1];
> > } ReplicationSlotCtlDataExt;
>
> This could work, but I think this is not a solution for HEAD anyway.
> In the HEAD, it would be better to keep everything inside the
> ReplicationSlot struct. In the same time, I don't like idea to have
> different shared memory structs between branches if we can avoid that.
>
> > > Yeah, but with physical slots, it is possible that the slot's xmin
> > > value is pointing to some value, say 700 (after restart), but vacuum
> > > would have removed tuples from transaction IDs greater than 700 as
> > > explained in email [1].
> >
> > I think, we have no xmin problem for physical slots. The xmin values of
> > physical slots are used to process HSF messages. If I correctly understood what
> > you mean, you are telling about the problem which is solved by hot standby
> > feedback messages. This message is used to disable tuples vacuuming on the
> > primary to avoid delete conflicts on the replica in queries (some queries may
> > select some tuples which were vacuumed on the primary and deletions are
> > replicated to the standby). If the primary receives a HSF message after slot
> > saving, I believe, it is allowable if autovacuum cleans tuples with xmin later
> > than the last saved value. If the primary restarts, the older value will be
> > loaded but the replica already confirmed the newer value. Concerning replica,
> > it is the obligation of the replica to send such HSF xmin that will survive
> > replica's immediate restart.
>
> +1
>

The point is about the general principle of slot's xmin values, which
is that the rows with xid greater than slot's xmin should be available
(or can't be removed by vacuum). But here, such a principle could be
violated after a restart. I don't have a test to show what harm it can
cause, but will try to think/investigate more on it.

> > >> Taking into account these thoughts, I can't see any problems with the alternative
> > >> patch where oldest wal lsn is calculated only in checkpoint.
> > >>
> > >The alternative will needlessly prevent removing WAL segments in some
> > >cases when logical slots are in use.
> >
> > IMHO, I'm not sure, it will significantly impact the wal removal. We remove WAL
> > segments only in checkpoint. The alternate solution gets the oldest WAL segment
> > at the beginning of checkpoint, then saves dirty slots to disk, and removes old
> > WAL segments at the end of checkpoint using the oldest WAL segment obtained at
> > the beginning of checkpoint. The alternate solution may not be so effective
> > in terms of WAL segments removal, if a logical slot is advanced during
> > checkpoint, but I do not think it is a significant issue. From the other hand,
> > the alternate solution simplifies the logic of WAL removal, backward compatible
> > (avoids addition new in-memory states), decreases the number of locks in
> > ReplicationSlotsComputeRequiredLSN - no need to recalculate oldest slots'
> > restart lsn every time when a slot is advanced.
>
> So, my proposal is to commit the attached patchset to the HEAD, and
> commit [1] to the back branches. Any objections?
>

No objections. I think we can keep discussing if slot's xmin
computation has any issues or not, but you can proceed with the LSN
stuff.

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantin Knizhnik 2025-06-12 12:12:00 Re: Non-reproducible AIO failure
Previous Message jian he 2025-06-12 11:03:13 Re: Add SPLIT PARTITION/MERGE PARTITIONS commands