Quick Links

Re: Optimize LWLock scalability via ReadBiasedLWLock for heavily-shared locks

From:	"Zhou, Zhiguo" <zhiguo(dot)zhou(at)intel(dot)com>
To:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Cc:	<tianyou(dot)li(at)intel(dot)com>
Subject:	Re: Optimize LWLock scalability via ReadBiasedLWLock for heavily-shared locks
Date:	2025-07-01 13:30:14
Message-ID:	ea71c35c-57c2-497e-814e-2e83c58bcaed@intel.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 6/26/2025 1:07 PM, Zhou, Zhiguo wrote:
> Hi Hackers,
>
> This patch addresses severe LWLock contention observed on high-core systems
> where hundreds of processors concurrently access frequently-shared locks.
> Specifically for ProcArrayLock (exhibiting 93.5% shared-mode acquires), we
> implement a new ReadBiasedLWLock mechanism to eliminate the atomic
> operation
> bottleneck.
>
> Key aspects:
> 1. Problem: Previous optimizations[1] left LWLockAttemptLock/Release
> consuming
> ~25% total CPU cycles on 384-vCPU systems due to contention on a single
> lock-state cache line. Shared lock attempts showed 37x higher
> cumulative
> latency than exclusive mode for ProcArrayLock.
>
> 2. Solution: ReadBiasedLWLock partitions lock state across 16 cache lines
> (READ_BIASED_LOCK_STATE_COUNT):
> - Readers acquire/release only their designated LWLock (indexed by
> pid % 16) using a single atomic operation
> - Writers pay higher cost by acquiring all 16 sub-locks exclusively
> - Maintains LWLock's "acquiring process must release" semantics
>
> 3. Performance: HammerDB/TPCC shows 35.3% NOPM improvement over baseline
> - Lock acquisition CPU cycles reduced from 16.7% to 7.4%
> - Lock release cycles reduced from 7.9% to 2.2%
>
> 4. Implementation:
> - Core infrastructure for ReadBiasedLWLock
> - ProcArrayLock converted as proof-of-concept
> - Maintains full LWLock API compatibility
>
> Known considerations:
> - Increased writer acquisition cost (acceptable given rarity of exclusive
> acquisitions for biased locks like ProcArrayLock)
> - Memory overhead: 16x size increase per converted lock
> - Currently validated for ProcArrayLock; other heavily-shared locks may be
> candidates after further analysis
>
> This is a preliminary version for community feedback. We're actively:
> 1. Refining the implementation details
> 2. Expanding test coverage
> 3. Investigating additional lock candidates
> 4. Optimizing writer-fast-path opportunities
>
> Test results, profiling data, and design details can be shared upon
> request.
> We appreciate all comments and suggestions for improvement.
>
> [1]Optimize shared LWLock acquisition for high-core-count systems:
> https://www.postgresql.org/message-id/flat/73d53acf-4f66-41df-
> b438-5c2e6115d4de%40intel.com
>
> Regards,
>
> Zhiguo

As a follow-up to previous mail, the proposed patch has undergone
further refinement by being logically split into two distinct patches
for clearer review and testing: the first introduces the new
ReadBiasedLWLock mechanism, and the second converts ProcArrayLock
specifically from an LWLock to utilize this new ReadBiasedLWLock type.
This patchset has successfully passed the standard make-check regression
tests. We are now eager to submit these updated patches for community
review and welcome any feedback or discussion on the implementation
approach. Thanks!

Regards,
Zhiguo

Attachment	Content-Type	Size
v1-0001-Introduce-ReadBiasedLWLock-for-high-concurrency-read.patch	text/plain	20.2 KB
v1-0002-Convert-ProcArrayLock-to-ReadBiasedLWLock-for-improv.patch	text/plain	35.1 KB

In response to

Optimize LWLock scalability via ReadBiasedLWLock for heavily-shared locks at 2025-06-26 05:07:49 from Zhou, Zhiguo

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Amit Langote	2025-07-01 13:42:06	Re: Proposal: Global Index for PostgreSQL
Previous Message	Daniil Davydov	2025-07-01 11:56:11	Re: Prevent internal error at concurrent CREATE OR REPLACE FUNCTION