| From: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com> |
|---|---|
| To: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Pradeep Kumar <spradeepkumar29(at)gmail(dot)com> |
| Cc: | Alexander Lakhin <exclusion(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | RE: Assertion failure in SnapBuildInitialSnapshot() |
| Date: | 2025-11-06 06:33:21 |
| Message-ID: | TY4PR01MB16907CA7F45898B73F084F80B94C2A@TY4PR01MB16907.jpnprd01.prod.outlook.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Thursday, October 30, 2025 7:01 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Mon, Oct 27, 2025 at 5:22 AM Pradeep Kumar
> <spradeepkumar29(at)gmail(dot)com> wrote:
> >
> > Hi All,
> > In this thread they proposed fix_concurrent_slot_xmin_update.patch will
> solve this assert failure. After applying this patch I execute
> pg_sync_replication_slots() (which calls SyncReplicationSlots →
> synchronize_slots() → synchronize_one_slot() →
> ReplicationSlotsComputeRequiredXmin(true)) can hit an assertion failure in
> ReplicationSlotsComputeRequiredXmin() because the
> ReplicationSlotControlLock is not held in that code path. By default
> sync_replication_slots is off, so the background slot-sync worker is not
> spawned; invoking the UDF directly exercises the path without the lock. I have
> a small patch that acquires ReplicationSlotControlLock in the manual sync
> path; that stops the assert.
> >
> >
> > The assert is raised inside ReplicationSlotsComputeRequiredXmin()
> because that function expects either that already_locked is false (and it will
> acquire what it needs), or that callers already hold both
> ReplicationSlotControlLock (exclusive) and ProcArrayLock (exclusive). In the
> manual-sync path called by the UDF, neither lock is held, so the assertion trips.
> >
> > Why this happens:
> > The background slot sync worker (spawned when sync_replication_slots =
> on) acquires the necessary locks before calling the routines that
> update/compute slot xmins, so the worker path is safe.The manual path
> through the SQL-callable UDF does not take the same locks before calling
> synchronize_slots()/synchronize_one_slot(). As a result the invariant
> assumed by ReplicationSlotsComputeRequiredXmin() can be violated, leading
> to the assert.
> >
> > Proposed fix:
> > In synchronize_slots() (the code path used by
> SyncReplicationSlots()/pg_sync_replication_slots()), acquire
> ReplicationSlotControlLock before any call that can end up calling
> ReplicationSlotsComputeRequiredXmin(true).
>
> It would be great if we have a test case for this issue possibly using injection
> points.
>
> Also, I think it's worth considering the idea Robert shared before[1]:
>
> ---
> But what about just surgically preventing that?
> ProcArraySetReplicationSlotXmin() could refuse to retreat the values,
> perhaps? If it computes an older value than what's there, it just does nothing?
> ---
>
> We did a similar fix for confirmed_flush LSN by commit ad5eaf390c582, and it
> sounds reasonable to me that ProcArraySetReplicationSlotXmin() refuses to
> retreat the values.
I reviewed the thread and think that we could not straightforwardly apply a
similar strategy to prevent the retreat of xmin/catalog_xmin here. This is
because we maintain a central value
(replication_slot_xmin/replication_slot_catalog_xmin) in
ProcArraySetReplicationSlotXmin, where the value is expected to decrease when
certain slots are dropped or invalidated. Therefore, I think we might need to
continue with the original proposal to invert the lock and also address the code
path for slotsync.
> [1]
> https://www.postgresql.org/message-id/CA%2BTgmoYLzJxCEa0aCan3KR7o
> _25G52cbqw-90Q0VGRmV3a8XGQ%40mail.gmail.com
Best Regards,
Hou zj
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Paul A Jungwirth | 2025-11-06 06:36:01 | Re: GiST README typos |
| Previous Message | Bryan Green | 2025-11-06 06:05:06 | Re: [Patch] Windows relation extension failure at 2GB and 4GB |