Re: Bug: WAIT FOR LSN crashes with assertion failure inside PL/pgSQL DO blocks and procedures

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Xuneng Zhou <xunengzhou(at)gmail(dot)com>
Cc: SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Bug: WAIT FOR LSN crashes with assertion failure inside PL/pgSQL DO blocks and procedures
Date: 2026-04-09 11:47:05
Message-ID: CAPpHfdvZaqKuB4Crn1ieOnw7=fVqkggKU85oyEs5TF5xaHtttw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 9, 2026 at 10:04 AM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
> On Thu, Apr 9, 2026 at 2:38 PM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
> >
> > On Thu, Apr 9, 2026 at 2:00 PM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
> > >
> > > Hi, Satya!
> > >
> > > On Thu, Apr 9, 2026 at 5:03 AM SATYANARAYANA NARLAPURAM
> > > <satyanarlapuram(at)gmail(dot)com> wrote:
> > > > An assertion failure (server crash in assert-enabled builds) occurs when WAIT FOR LSN ... INTO is used inside PL/pgSQL DO blocks or within void procedures.
> > > >
> > > > Repro:
> > > >
> > > > -- Run this on a standby
> > > >
> > > > CREATE PROCEDURE test_wait()
> > > > LANGUAGE plpgsql AS $$
> > > > DECLARE
> > > > result text;
> > > > BEGIN
> > > > WAIT FOR LSN '0/1234' INTO result;
> > > > RAISE NOTICE '%', result;
> > > > END;
> > > > $$;
> > > > CALL test_wait();
> > > >
> > > >
> > > > The WAIT FOR itself succeeds, but the very next PL/pgSQL statement that requires a snapshot crashes the backend with:
> > > >
> > > > TRAP: failed Assert("portal->portalSnapshot == NULL"),
> > > > File: "pquery.c", Line: 1776
> > > >
> > > > Attached patches for both the test case and a potential fix. Please review.
> > >
> > > Thank you for reporting. But I doubt the fix is correct. Even that
> > > this particular might work OK, I don't think it's safe to release
> > > snapshots belonging to functions/procedures: it might affect them. I
> > > tend to think we must forbid wrapping WAIT FOR LSN with
> > > functions/procedures. I'll explore more on this today.
> > >
>
> Opus, sorry for clicking send incidentally before typing anything...
>
> I had looked at these patches before and didn’t see anything
> particularly wrong except:
> 1. patch 1 unconditionally nulled ActivePortal->portalSnapshot
> whenever it was non-NULL after the pop;
> 2. patch 2 used RAISE NOTICE after WAIT FOR, which seems not excercise
> the bug straightforwardly.
>
> I didn't realized the safety implications of releasing a procedure/DO
> snapshot during PL execution. I’ve noticed several warnings in the
> tree advising against this.
>
> /*
> * Ensure there's an active snapshot whilst we execute whatever's
> * involved here. Note that this is *not* sufficient to make the
> * world safe for TOAST pointers to be included in the returned data:
> * the referenced data could have gone away while we didn't hold a
> * snapshot. Hence, it's incumbent on PLs that can do COMMIT/ROLLBACK
> * to not return TOAST pointers, unless those pointers were fetched
> * after the last COMMIT/ROLLBACK in the procedure.
> *
> * XXX that is a really nasty, hard-to-test requirement. Is there a
> * way to remove it?
> */
> EnsurePortalSnapshotExists();

Regarding functions, function may be part of bigger query running with
particular snapshot. Note that single query is always executed within
the single snapshot (barring EvalPlanQual()) even in read committed
mode. Releasing a snapshot in the middle of the query could cause, if
even we somehow re-acquire a new snapshot, could cause rest of query
to be executed inconsistently with its beginning.

It is probably different for procedures, which don't have to stick to
a single snapshot. But I can still imagine WAIT FOR LSN to be inside
a loop over some query results or something similar. We are now past
FF for PG19. I suggest we should now forbid WAIT FOR LSN both in
functions and procedures. For PG20 we can reconsider some cases when
running WAIT FOR LSN inside the stored procedure is safe.

------
Regards,
Alexander Korotkov
Supabase

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2026-04-09 11:55:01 Re: Eliminating SPI / SQL from some RI triggers - take 3
Previous Message Daniil Davydov 2026-04-09 11:46:09 Re: Fix bug with accessing to temporary tables of other sessions