Re: Assertion failure in SnapBuildInitialSnapshot()

From: Andres Freund <andres(at)anarazel(dot)de>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, kuroda(dot)hayato(at)fujitsu(dot)com, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: Assertion failure in SnapBuildInitialSnapshot()
Date: 2022-11-16 02:00:48
Message-ID: 20221116020048.pay3jquwrwg3qd2y@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-11-15 16:20:00 +0530, Amit Kapila wrote:
> On Tue, Nov 15, 2022 at 8:08 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > nor do we enforce in an obvious place that we
> > don't already hold a snapshot.
> >
>
> We have a check for (FirstXactSnapshot == NULL) in
> RestoreTransactionSnapshot->SetTransactionSnapshot. Won't that be
> sufficient?

I don't think that'd e.g. catch a catalog snapshot being held, yet that'd
still be bad. And I think checking in SetTransactionSnapshot() is too late,
we've already overwritten MyProc->xmin by that point.

On 2022-11-15 17:21:44 +0530, Amit Kapila wrote:
> > One thing I noticed just now is that we don't assert
> > builder->building_full_snapshot==true. I think we should? That didn't use to
> > be an option, but now it is... It doesn't look to me like that's the issue,
> > but ...
> >
>
> Agreed.
>
> The attached patch contains both changes. It seems to me this issue
> can happen, if somehow, either slot's effective_xmin increased after
> we assign its initial value in CreateInitDecodingContext() or somehow
> its value is InvalidTransactionId when we have invoked
> SnapBuildInitialSnapshot(). The other possibility is that the
> initial_xmin_horizon check in SnapBuildFindSnapshot() doesn't insulate
> us from assigning builder->xmin value older than initial_xmin_horizon.
> I am not able to see if any of this can be true.

Yea, I don't immediately see anything either. Given the discussion in
https://www.postgresql.org/message-id/Yz2hivgyjS1RfMKs%40depesz.com I am
starting to wonder if we've introduced a race in the slot machinery.

> diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
> index 5006a5c464..e85c75e0e6 100644
> --- a/src/backend/replication/logical/snapbuild.c
> +++ b/src/backend/replication/logical/snapbuild.c
> @@ -566,11 +566,13 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
> {
> Snapshot snap;
> TransactionId xid;
> + TransactionId safeXid;
> TransactionId *newxip;
> int newxcnt = 0;
>
> Assert(!FirstSnapshotSet);
> Assert(XactIsoLevel == XACT_REPEATABLE_READ);
> + Assert(builder->building_full_snapshot);
>
> if (builder->state != SNAPBUILD_CONSISTENT)
> elog(ERROR, "cannot build an initial slot snapshot before reaching a consistent state");
> @@ -589,17 +591,13 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
> * mechanism. Due to that we can do this without locks, we're only
> * changing our own value.
> */

Perhaps add something like "Creating a snapshot is expensive and an unenforced
xmin horizon would have bad consequences, therefore always double-check that
the horizon is enforced"?

> -#ifdef USE_ASSERT_CHECKING
> - {
> - TransactionId safeXid;
> -
> - LWLockAcquire(ProcArrayLock, LW_SHARED);
> - safeXid = GetOldestSafeDecodingTransactionId(false);
> - LWLockRelease(ProcArrayLock);
> + LWLockAcquire(ProcArrayLock, LW_SHARED);
> + safeXid = GetOldestSafeDecodingTransactionId(false);
> + LWLockRelease(ProcArrayLock);
>
> - Assert(TransactionIdPrecedesOrEquals(safeXid, snap->xmin));
> - }
> -#endif
> + if (TransactionIdFollows(safeXid, snap->xmin))
> + elog(ERROR, "cannot build an initial slot snapshot when oldest safe xid %u follows snapshot's xmin %u",
> + safeXid, snap->xmin);
>
> MyProc->xmin = snap->xmin;
>

s/when/as/

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Japin Li 2022-11-16 02:02:40 Re: closing file in adjust_data_dir
Previous Message Ian Lawrence Barwick 2022-11-16 01:52:35 Re: Documentation for building with meson