Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, tomas(at)vondra(dot)me
Subject: Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly
Date: 2025-05-26 08:57:03
Message-ID: CAPpHfdvUFEYoCKBDKKb1vD3SvKyB9swivp9dn1Gz7WP1r8qxgw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 26, 2025 at 9:49 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Sat, May 24, 2025 at 6:59 PM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
> >
> > Hi, Amit!
> >
> > Thank you for your attention to this patchset!
> >
> > On Sat, May 24, 2025 at 2:15 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > On Sat, May 24, 2025 at 4:08 PM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
> > > >
> > > > I spend more time on this. The next revision is attached. It
> > > > contains revised comments and other cosmetic changes. I'm going to
> > > > backpatch 0001 to all supported branches,
> > > >
> > >
> > > Is my understanding correct that we need 0001 because
> > > PhysicalConfirmReceivedLocation() doesn't save the slot to disk after
> > > changing the slot's restart_lsn?
> >
> > Yes. Also, even if it would save slot to the disk, there is still
> > race condition that concurrent checkpoint could use updated value from
> > the shared memory to clean old WAL segments, and then crash happens
> > before we managed to write the slot to the disk.
> >
>
> How can that happen, if we first write the updated value to disk and
> then update the shared memory as we do in
> LogicalConfirmReceivedLocation?

I don't see this to be true for LogicalConfirmReceivedLocation() and
restart_lsn. We clearly don't update restart_lsn in shared memory
after flush. It we previously proposed to resolve a problem for
restart_lsn like this, but that approach breaks ABI and couldn't be
backpatched.

> BTW, won't there be a similar problem with physical slot's xmin
> computation as well? In PhysicalReplicationSlotNewXmin(), after
> updating the slot's xmin computation, we mark it as dirty and update
> shared memory values. Now, say after checkpointer writes these xmin
> values to disk, walsender receives another feedback message and
> updates the slot's xmin values. Now using these updated shared memory
> values, vacuum removes the rows, however, a restart will show the
> older xmin values in the slot, which mean vacuum would have removed
> the required rows before restart.

I don't yet see why this should be a problem. feedbackXmin provides a
barrier for vacuum, but unlike restart_lsn it doesn't refers removed
tuples as the resource. If vacuum would remove some tuples based on
the last feedback message that tuples are not needed by replica. If
after restart we would have outdated feedbackXmin that would make our
vacuum even more conservative for a while, but I don't see how that
would lead to material problem. On contrast you can see in
046_logical_slot.pl how lack of restart_lsn synchronization leads to
an error while attempting to decode the changes, because the current
code expects WAL at restart_lsn to exist.

> As per my understanding, neither the xmin nor the LSN problem exists
> for logical slots. I am pointing this out to indicate we may need to
> think of a different solution for physical slots, if these are
> problems only for physical slots.

There is indeed a problem for logical slots. You can apply 0002 patch
alone and test for logical slots will fail.

------
Regards,
Alexander Korotkov
Supabase

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2025-05-26 09:29:50 Re: MERGE issues around inheritance
Previous Message Feike Steenbergen 2025-05-26 08:56:45 Re: pg18: Virtual generated columns are not (yet) safe when superuser selects from them