Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, tomas(at)vondra(dot)me
Subject: Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly
Date: 2025-05-18 21:25:36
Message-ID: CAPpHfdvpcboCAxbj+xPmCWNRZxG=JwW3qRsz4No_OvP3j03ERg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Vitaly!

On Fri, May 2, 2025 at 8:47 PM Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru> wrote:
> Thank you for the attention to the patch. I updated a patch with a better
> solution for the master branch which can be easily backported to the other
> branches as we agree on the final solution.
>
> Two tests are introduced which are based on Tomas Vondra's test for logical slots
> with injection points from the discussion (patches: [1], [2], [3]). Tests
> are implemented as module tests in src/test/modules/test_replslot_required_lsn
> directory. I slightly modified the original test for logical slots and created a
> new simpler test for physical slots without any additional injection points.

The patchset doesn't seem to build after 371f2db8b0, which adjusted
the signature of the INJECTION_POINT() macro. Could you, please,
update the patchset accordingly.

> I prepared a new solution (patch [4]) which is also based on Tomas Vondra's
> proposal. With a fresh eye, I realized that it can fix the issue as well. It is
> easy and less invasive to implement. The new solution differs from my original
> solution: it is backward compatible (doesn't require any changes in ReplicationSlot
> structure). My original solution can be backward compatible as well if to
> allocate flushed_restart_lsn in a separate array in shmem, not in the
> ReplicationSlot structure, but I believe the new solution is the better one. If
> you still think that my previous solution is the better (I don't think so), I
> will prepare a backward compatible patch with my previous solution.

I see in 0004 patch we're calling XLogGetReplicationSlotMinimumLSN()
before slots synchronization then use it for WAL truncation.
Generally looks good. But what about the "if
(InvalidateObsoleteReplicationSlots(...))" branch? It calls
XLogGetReplicationSlotMinimumLSN() again. Why would the value
obtained from the latter call reflect slots as they are synchronized
to the disk?

------
Regards,
Alexander Korotkov
Supabase

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2025-05-18 23:31:33 Violation of principle that plan trees are read-only
Previous Message Tom Lane 2025-05-18 18:42:03 Re: Support for Physical Column Reordering in PG