| From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
|---|---|
| To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
| Cc: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Pradeep Kumar <spradeepkumar29(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Assertion failure in SnapBuildInitialSnapshot() |
| Date: | 2025-11-24 18:48:19 |
| Message-ID: | CAD21AoDgVhbqBADVW1ArKS6gXpN3QPXhUmcXsHENVUJ13s1-wA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Mon, Nov 24, 2025 at 1:46 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, Nov 21, 2025 at 9:17 AM Zhijie Hou (Fujitsu)
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > On Thursday, November 13, 2025 12:56 PM Zhijie Hou (Fujitsu) <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > >
> >
> > I have been thinking if there a way to avoid holding ReplicationSlotControlLock
> > exclusively in ReplicationSlotsComputeRequiredXmin() because that could cause
> > lock contention when many slots exist and advancements occur frequently.
> >
> > Given that the bug arises from a race condition between slot creation and
> > concurrent slot xmin computation, I think another way is that, we acquire the
> > ReplicationSlotControlLock exclusively only during slot creation to do the
> > initial update of the slot xmin. In ReplicationSlotsComputeRequiredXmin(), we
> > still hold the ReplicationSlotControlLock in shared mode until the global slot
> > xmin is updated in ProcArraySetReplicationSlotXmin(). This approach prevents
> > concurrent computations and updates of new xmin horizons by other backends
> > during the initial slot xmin update process, while it still permits concurrent
> > calls to ReplicationSlotsComputeRequiredXmin().
> >
>
> Yeah, this seems to work.
+1
>
> > Here is an update patch for this approach on HEAD.
> >
>
> Thanks for the patch.
>
> Sawada-San, are you planning to look into this? Otherwise, I can take
> care of it.
Yes, I'll review the patch and share some comments soon.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Nico Williams | 2025-11-24 18:54:11 | Re: [oauth] SASL mechanisms |
| Previous Message | David Geier | 2025-11-24 18:46:27 | Re: get rid of Pointer type, mostly |