Re: Assertion failure in SnapBuildInitialSnapshot()

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: Assertion failure in SnapBuildInitialSnapshot()
Date: 2023-02-08 04:47:58
Message-ID: CAA4eK1+JMM-hMNX3ysi-PTXRE62woZz5hD828gVWkZcKfK7x4A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 8, 2023 at 1:35 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> On 2023-02-07 11:49:03 -0800, Andres Freund wrote:
> > On 2023-02-01 11:23:57 +0530, Amit Kapila wrote:
> > > On Tue, Jan 31, 2023 at 6:08 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > Attached updated patches.
> > > >
> > >
> > > Thanks, Andres, others, do you see a better way to fix this problem? I
> > > have reproduced it manually and the steps are shared at [1] and
> > > Sawada-San also reproduced it, see [2].
> > >
> > > [1] - https://www.postgresql.org/message-id/CAA4eK1KDFeh%3DZbvSWPx%3Dir2QOXBxJbH0K8YqifDtG3xJENLR%2Bw%40mail.gmail.com
> > > [2] - https://www.postgresql.org/message-id/CAD21AoDKJBB6p4X-%2B057Vz44Xyc-zDFbWJ%2Bg9FL6qAF5PC2iFg%40mail.gmail.com
> >
> > Hm. It's worrysome to now hold ProcArrayLock exclusively while iterating over
> > the slots. ReplicationSlotsComputeRequiredXmin() can be called at a
> > non-neglegible frequency. Callers like CreateInitDecodingContext(), that pass
> > already_locked=true worry me a lot less, because obviously that's not a very
> > frequent operation.
>
> Separately from this change:
>
> I wonder if we ought to change the setup in CreateInitDecodingContext() to be a
> bit less intricate. One idea:
>
> Instead of having GetOldestSafeDecodingTransactionId() compute a value, that
> we then enter into a slot, that then computes the global horizon via
> ReplicationSlotsComputeRequiredXmin(), we could have a successor to
> GetOldestSafeDecodingTransactionId() change procArray->replication_slot_xmin
> (if needed).
>
> As long as CreateInitDecodingContext() prevents a concurent
> ReplicationSlotsComputeRequiredXmin(), by holding ReplicationSlotControlLock
> exclusively, that should suffice to ensure that no "wrong" horizon was
> determined / no needed rows have been removed. And we'd not need a lock nested
> inside ProcArrayLock anymore.
>
>
> Not sure if it's sufficiently better to be worth bothering with though :(
>

I am also not sure because it would improve concurrency for
CreateInitDecodingContext() which shouldn't be called at a higher
frequency. Also, to some extent, the current coding or the approach we
are discussing is easier to follow as we would always update
procArray->replication_slot_xmin after checking all the slots.

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2023-02-08 05:03:38 Re: Improve WALRead() to suck data directly from WAL buffers when possible
Previous Message Tom Lane 2023-02-08 04:37:54 Re: OpenSSL 3.0.0 vs old branches