| From: | Ayush Tiwari <ayushtiwari(dot)slg01(at)gmail(dot)com> |
|---|---|
| To: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
| Cc: | Xuneng Zhou <xunengzhou(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: [PATCH] Fix WAIT FOR LSN cleanup on subtransaction abort |
| Date: | 2026-05-06 09:55:35 |
| Message-ID: | CAJTYsWUWPM2Fpv=vYXG=Yj085jaDd36hOu1=g6zdvOvVNgm7GQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
On Wed, 6 May 2026 at 15:13, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
wrote:
> On Wed, May 6, 2026 at 11:46 AM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
> >
> > On Wed, May 6, 2026 at 3:18 PM Ayush Tiwari <ayushtiwari(dot)slg01(at)gmail(dot)com>
> wrote:
> > >
> > > Hi,
> > >
> > > I found a backend crash in WAIT FOR LSN when it is interrupted inside a
> > > savepoint and the session then waits again.
> > >
> > > I tried to find if it was already reported, but could not find it, so,
> posting it.
> > >
> > > While navigating I noticed WAIT FOR LSN cleanup is incomplete on
> > > subtransaction abort. An interrupt such as statement_timeout while
> > > waiting inside a savepoint leaves stale per-backend wait state,
> > > causing a later WAIT FOR LSN in the same backend to violate
> > > the wait-heap invariant and crash an assertion-enabled build.
> > >
> > > A small reproducer is:
> > >
> > > BEGIN;
> > > SAVEPOINT s;
> > > SET statement_timeout = '100ms';
> > > WAIT FOR LSN '<future-lsn>' WITH (MODE 'primary_flush');
> > > ROLLBACK TO s;
> > > SET statement_timeout = 0;
> > > WAIT FOR LSN '0/0' WITH (MODE 'primary_flush', TIMEOUT '10ms',
> NO_THROW);
> > > COMMIT;
> > >
> > > where <future-lsn> can be generated with:
> > >
> > > SELECT pg_current_wal_insert_lsn() + 10000000000;
> > >
> > > TRAP: failed Assert("!procInfo->inHeap"), File: "xlogwait.c"
> > >
> > > The attached patch mirrors the top-level abort cleanup by calling
> > > WaitLSNCleanup() from AbortSubTransaction(), after
> LWLockReleaseAll(). It
> > > also adds a TAP test to verify that WAIT FOR LSN can be reused in the
> same
> > > backend after a statement_timeout and ROLLBACK TO SAVEPOINT.
> > >
> > > Thoughts?
> >
> > Thanks for reporting this. I agree with your analysis.
>
> +1, thank you for reporting this.
>
> > We need to add
> > this clean-up into AbortSubTransaction. I've some comments on the
> > patch:
> >
> > 1) Update the comment of WaitLSNCleanup
> >
> > Now the comment of this function says, "Clean up LSN waiters for
> > exiting process." After this patch, this description would be too
> > narrow because after a ROLLBACK TO SAVEPOINT, the same backend
> > continues running and may issue another WAIT FOR LSN. How about
> > changing it to something like:
> >
> > /*
> > * Clean up any LSN wait state for the current process.
> > */
>
> I agree with this change. Under detailed consideration, "existing
> process" sounds confusing in this context even if it's called just
> from AbortTransaction().
>
> > 2) How about turning the SET statement_timeout = '100ms'; into more
> > deterministic machinery used in other sections of the test file to
> > ensure that the registration actually occurs by starting the waiter in
> > a background psql session and waits until itself reports: wait_event =
> > 'WaitForWalFlush' for that backend.
> >
> > 3) For the validation, consider replacing the fast success of '0/0'
> > with the timeout of the unreachable LSN used earlier. This would
> > reduce assumptions about the order between registration and the fast
> > check being fixed.
>
> I'm OK with these changes too, as they improve the test stability.
>
> I'm going to push this if no objections.
>
>
Looks good to me. Thanks for the update and review!
Regards,
Ayush
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Alexander Korotkov | 2026-05-06 10:31:28 | Re: race condition when writing pg_control |
| Previous Message | Andrey Borodin | 2026-05-06 09:51:00 | Re: [PATCH] Fix ProcKill lock-group vs procLatch recycle race |