Re: Speed up transaction completion faster after many relations are accessed in a transaction

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Zhihong Yu <zyu(at)yugabyte(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, "Imai, Yoshikazu" <imai(dot)yoshikazu(at)jp(dot)fujitsu(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Amit Langote <amitlangote09(at)gmail(dot)com>
Subject: Re: Speed up transaction completion faster after many relations are accessed in a transaction
Date: 2026-01-20 08:43:09
Message-ID: 7a342e6f-ddd7-4d14-a953-731255493164@iki.fi
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Returning to an old thread that I just remembered:

On 09/01/2024 08:24, vignesh C wrote:
> On Thu, 9 Nov 2023 at 21:48, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>>
>> On 18/09/2023 07:08, David Rowley wrote:
>>> On Fri, 15 Sept 2023 at 22:37, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>> A few quick comments:
>>
>> - It would be nice to add a test for the issue that you fixed in patch
>> v7, i.e. if you prepare a transaction while holding session-level locks.
>>
>> - GrantLockLocal() now calls MemoryContextAlloc(), which can fail if you
>> are out of memory. Is that handled gracefully or is the lock leaked?

The above are still valid comments..

> CFBot shows one of the test has aborted at [1] with:
> [20:54:28.535] Core was generated by `postgres: subscriber: logical
> replication apply worker for subscription 16397 '.
> [20:54:28.535] Program terminated with signal SIGABRT, Aborted.
> [20:54:28.535] #0 __GI_raise (sig=sig(at)entry=6) at
> ../sysdeps/unix/sysv/linux/raise.c:50
> [20:54:28.535] Download failed: Invalid argument. Continuing without
> source file ./signal/../sysdeps/unix/sysv/linux/raise.c.
> [20:54:28.627]
> [20:54:28.627] Thread 1 (Thread 0x7f0ea02d1a40 (LWP 50984)):
> [20:54:28.627] #0 __GI_raise (sig=sig(at)entry=6) at
> ../sysdeps/unix/sysv/linux/raise.c:50
> ...
> ...
> [20:54:28.627] #2 0x00005618e989d62f in ExceptionalCondition
> (conditionName=conditionName(at)entry=0x5618e9b40f70
> "dlist_is_empty(&(MyProc->myProcLocks[i]))",
> fileName=fileName(at)entry=0x5618e9b40ec0
> "../src/backend/storage/lmgr/proc.c", lineNumber=lineNumber(at)entry=856)
> at ../src/backend/utils/error/assert.c:66
> [20:54:28.627] No locals.
> [20:54:28.627] #3 0x00005618e95e6847 in ProcKill (code=<optimized
> out>, arg=<optimized out>) at ../src/backend/storage/lmgr/proc.c:856
> [20:54:28.627] i = <optimized out>
> [20:54:28.627] proc = <optimized out>
> [20:54:28.627] procgloballist = <optimized out>
> [20:54:28.627] __func__ = "ProcKill"
> [20:54:28.627] #4 0x00005618e959ebcc in shmem_exit
> (code=code(at)entry=1) at ../src/backend/storage/ipc/ipc.c:276
> [20:54:28.627] __func__ = "shmem_exit"
> [20:54:28.627] #5 0x00005618e959ecd0 in proc_exit_prepare
> (code=code(at)entry=1) at ../src/backend/storage/ipc/ipc.c:198
> [20:54:28.627] __func__ = "proc_exit_prepare"
> [20:54:28.627] #6 0x00005618e959ee8e in proc_exit (code=code(at)entry=1)
> at ../src/backend/storage/ipc/ipc.c:111
> [20:54:28.627] __func__ = "proc_exit"
> [20:54:28.627] #7 0x00005618e94aa54d in BackgroundWorkerMain () at
> ../src/backend/postmaster/bgworker.c:805
> [20:54:28.627] local_sigjmp_buf = {{__jmpbuf =
> {94665009627112, -3865857745677845768, 0, 0, 140732736634980, 1,
> 3865354362587970296, 7379258256398875384}, __mask_was_saved = 1,
> __saved_mask = {__val = {18446744066192964099, 94665025527920,
> 94665025527920, 94665025527920, 0, 94665025528120, 8192, 1,
> 94664997686410, 94665009627040, 94664997622076, 94665025527920, 1, 0,
> 0, 140732736634980}}}}
> [20:54:28.627] worker = 0x5618eb37c570
> [20:54:28.627] entrypt = <optimized out>
> [20:54:28.627] __func__ = "BackgroundWorkerMain"
> [20:54:28.627] #8 0x00005618e94b495c in do_start_bgworker
> (rw=rw(at)entry=0x5618eb3b73c8) at
> ../src/backend/postmaster/postmaster.c:5697
> [20:54:28.627] worker_pid = <optimized out>
> [20:54:28.627] __func__ = "do_start_bgworker"
> [20:54:28.627] #9 0x00005618e94b4c32 in maybe_start_bgworkers () at
> ../src/backend/postmaster/postmaster.c:5921
> [20:54:28.627] rw = 0x5618eb3b73c8
> [20:54:28.627] num_launched = 0
> [20:54:28.627] now = 0
> [20:54:28.627] iter = {cur = 0x5618eb3b79a8, next =
> 0x5618eb382a20, prev = 0x5618ea44a980 <BackgroundWorkerList>}
> [20:54:28.627] #10 0x00005618e94b574a in process_pm_pmsignal () at
> ../src/backend/postmaster/postmaster.c:5073
> [20:54:28.627] __func__ = "process_pm_pmsignal"
> [20:54:28.627] #11 0x00005618e94b5f4a in ServerLoop () at
> ../src/backend/postmaster/postmaster.c:1760
>
> [1] - https://cirrus-ci.com/task/5118173163290624?logs=cores#L51

I was able to reproduce and debug that. The patch made the assumption
that when a process is about to exit, in ShutdownPostgres(), it can only
have session-level locks left, after we have done
AbortOutOfAnyTransaction(). That assumption did not hold, because if
proc_exit() is called while we're waiting on a lock, the lock is not yet
registered with the resource owner (or if it's a session lock, it's not
yet added to the 'session_locks' list). Therefore, when
ShutdownPostgres() called AbortOutOfAnyTransaction() and
LockReleaseSession(USER_LOCKMETHOD), it would not clean up the lock
objects for the lock we were waiting on. That's pretty straightforward
to fix by also calling LockErrorCleanup() from ShutdownPostgres().
Alternatively, perhaps we should register the lock with the resource
owner earlier.

Here's a rebased version of the patch. I also split it into a few parts
for easier review:

v10-0001-Assert-that-all-fastpath-locks-are-released-befo.patch adds a
few assertions, which are not probably good to have even without the
rest of the patches.

v10-0002-use-linked-list-in-ResourceOwner-to-hold-LOCALLO.patch replaces
the "cache" in ResourceOwnerData with the linked list.

v10-0003-stop-using-releaseMask-in-PostPrepare_Locks.patch contains the
changes to PostPrepare_Locks(), so that it doesn't use the 'releaseMask'
field anymore.

v10-0004-Remove-LockReleaseAll.patch removes LockReleaseAll, and
replaces it with LockReleaseSession() calls.

v10-0005-Fix-assertion-failure-from-ProcKill-in-subscript.patch fixes
the assertion failure.

I realized that after these patches, we only use PGPROC->myProcLocks
lists in assertions and in PostPrepare_Locks(). PostPrepare_Locks()
would be pretty easy to refactor to not need them either. That would
shrink the PGPROC struct considerably.

- Heikki

Attachment Content-Type Size
v10-0001-Assert-that-all-fastpath-locks-are-released-befo.patch text/x-patch 2.3 KB
v10-0002-use-linked-list-in-ResourceOwner-to-hold-LOCALLO.patch text/x-patch 29.4 KB
v10-0003-stop-using-releaseMask-in-PostPrepare_Locks.patch text/x-patch 8.4 KB
v10-0004-Remove-LockReleaseAll.patch text/x-patch 22.9 KB
v10-0005-Fix-assertion-failure-from-ProcKill-in-subscript.patch text/x-patch 987 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2026-01-20 08:49:35 Re: commented out code
Previous Message Heikki Linnakangas 2026-01-20 08:37:00 Re: Remove more leftovers of AIX support