Re: [BUG] failed assertion in EnsurePortalSnapshotExists()

From: Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
To: "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [BUG] failed assertion in EnsurePortalSnapshotExists()
Date: 2021-09-29 10:59:41
Message-ID: CAEudQApS-u7PppZtp+3R-DWfT7HRHg=P4Lwy7f5E+p-_q-Jv=w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Em qua., 29 de set. de 2021 às 06:55, Drouvot, Bertrand <bdrouvot(at)amazon(dot)com>
escreveu:

> Hi,
>
> On 9/28/21 6:50 PM, Tom Lane wrote:
> > "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com> writes:
> >> Does it make sense (as it is currently) to set the ActiveSnapshot to
> >> NULL and not ensuring the same is done for ActivePortal->portalSnapshot?
> > I think that patch is just a kluge :-(
> right, sounded too simple to be "true".
> >
> > After tracing through this I've figured out what is happening, and
> > why you need to put the exception block inside a loop to make it
> > happen. The first iteration goes fine, and we end up with an empty
> > ActiveSnapshot stack (and no portalSnapshots) because that's how
> > the COMMIT leaves it. In the second iteration, we try to
> > re-establish the portal snapshot, but at the point where
> > EnsurePortalSnapshotExists is called (from the first INSERT
> > command) we are already inside a subtransaction thanks to the
> > plpgsql exception block. This means that the stacked ActiveSnapshot
> > has as_level = 2, although the Portal owning it belongs to the
> > outer transaction. So at the next exception, AtSubAbort_Snapshot
> > zaps the stacked ActiveSnapshot, but the Portal stays alive and
> > now it has a dangling portalSnapshot pointer.
>
> Thanks for the explanation!
>
> > Basically there seem to be two ways to fix this, both requiring
> > API additions to snapmgr.c (hence, cc'ing Alvaro for opinion):
> >
> > 1. Provide a variant of PushActiveSnapshot that allows the caller
> > to specify the as_level to use, and then have
> > EnsurePortalSnapshotExists, as well as other places that create
> > Portal-associated snapshots, use this with as_level equal to the
> > Portal's createSubid. The main hazard I see here is that this could
> > theoretically allow the ActiveSnapshot stack to end up with
> > out-of-order as_level values, causing issues. For the moment we
> > could probably just add assertions to check that the passed as_level
> > is >= next level, or maybe even that this variant is only called with
> > empty ActiveSnapshot stack.
> I think we may get a non empty ActiveSnapshot stack with prepared
> statements, so tempted to do the assertion on the levels.
> >
> > 2. Provide a way for AtSubAbort_Portals to detect whether a
> > portalSnapshot pointer points to a snap that's going to be
> > deleted by AtSubAbort_Snapshot, and then just have it clear any
> > portalSnapshots that are about to become dangling. (This'd amount
> > to accepting the possibility that portalSnapshot is of a different
> > subxact level from the portal, and dealing with the situation.)
> >
> > I initially thought #2 would be the way to go, but it turns out
> > to be a bit messy since what we have is a Snapshot pointer not an
> > ActiveSnapshotElt pointer. We'd have to do something like search the
> > ActiveSnapshot stack looking for pointer equality to the caller's
> > Snapshot pointer, which seems fragile --- do we assume as_snap is
> > unique for any other purpose?
> >
> > That being the case, I'm now leaning to #1. Thoughts?
>
> I'm also inclined to #1.
>
I have a stupid question, why duplicate PushActiveSnapshot?
Wouldn't one function be better?

PushActiveSnapshot(Snapshot snap, int as_level);

Sample calls:
PushActiveSnapshot(GetTransactionSnapshot(),
GetCurrentTransactionNestLevel());
PushActiveSnapshot(queryDesc->snapshot, GetCurrentTransactionNestLevel());
PushActiveSnapshot(GetTransactionSnapshot(), portal->createSubid);

regards,
Ranier Vilela

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-09-29 11:07:37 Re: POC: Cleaning up orphaned files using undo logs
Previous Message Amit Kapila 2021-09-29 10:51:21 Re: Failed transaction statistics to measure the logical replication progress