Re: Optimize LWLock scalability via ReadBiasedLWLock for heavily-shared locks

From: "Zhou, Zhiguo" <zhiguo(dot)zhou(at)intel(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Cc: <tianyou(dot)li(at)intel(dot)com>
Subject: Re: Optimize LWLock scalability via ReadBiasedLWLock for heavily-shared locks
Date: 2025-07-01 13:30:14
Message-ID: ea71c35c-57c2-497e-814e-2e83c58bcaed@intel.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 6/26/2025 1:07 PM, Zhou, Zhiguo wrote:
> Hi Hackers,
>
> This patch addresses severe LWLock contention observed on high-core systems
> where hundreds of processors concurrently access frequently-shared locks.
> Specifically for ProcArrayLock (exhibiting 93.5% shared-mode acquires), we
> implement a new ReadBiasedLWLock mechanism to eliminate the atomic
> operation
> bottleneck.
>
> Key aspects:
> 1. Problem: Previous optimizations[1] left LWLockAttemptLock/Release
> consuming
>    ~25% total CPU cycles on 384-vCPU systems due to contention on a single
>    lock-state cache line. Shared lock attempts showed 37x higher
> cumulative
>    latency than exclusive mode for ProcArrayLock.
>
> 2. Solution: ReadBiasedLWLock partitions lock state across 16 cache lines
>    (READ_BIASED_LOCK_STATE_COUNT):
>    - Readers acquire/release only their designated LWLock (indexed by
>      pid % 16) using a single atomic operation
>    - Writers pay higher cost by acquiring all 16 sub-locks exclusively
>    - Maintains LWLock's "acquiring process must release" semantics
>
> 3. Performance: HammerDB/TPCC shows 35.3% NOPM improvement over baseline
>    - Lock acquisition CPU cycles reduced from 16.7% to 7.4%
>    - Lock release cycles reduced from 7.9% to 2.2%
>
> 4. Implementation:
>    - Core infrastructure for ReadBiasedLWLock
>    - ProcArrayLock converted as proof-of-concept
>    - Maintains full LWLock API compatibility
>
> Known considerations:
> - Increased writer acquisition cost (acceptable given rarity of exclusive
>   acquisitions for biased locks like ProcArrayLock)
> - Memory overhead: 16x size increase per converted lock
> - Currently validated for ProcArrayLock; other heavily-shared locks may be
>   candidates after further analysis
>
> This is a preliminary version for community feedback. We're actively:
> 1. Refining the implementation details
> 2. Expanding test coverage
> 3. Investigating additional lock candidates
> 4. Optimizing writer-fast-path opportunities
>
> Test results, profiling data, and design details can be shared upon
> request.
> We appreciate all comments and suggestions for improvement.
>
> [1]Optimize shared LWLock acquisition for high-core-count systems:
> https://www.postgresql.org/message-id/flat/73d53acf-4f66-41df-
> b438-5c2e6115d4de%40intel.com
>
> Regards,
>
> Zhiguo

As a follow-up to previous mail, the proposed patch has undergone
further refinement by being logically split into two distinct patches
for clearer review and testing: the first introduces the new
​​ReadBiasedLWLock​​ mechanism, and the second converts ProcArrayLock
specifically from an LWLock to utilize this new ReadBiasedLWLock type.
This patchset has successfully passed the standard make-check regression
tests. We are now eager to submit these updated patches for community
review and welcome any feedback or discussion on the implementation
approach. Thanks!

Regards,
Zhiguo

Attachment Content-Type Size
v1-0001-Introduce-ReadBiasedLWLock-for-high-concurrency-read.patch text/plain 20.2 KB
v1-0002-Convert-ProcArrayLock-to-ReadBiasedLWLock-for-improv.patch text/plain 35.1 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2025-07-01 13:42:06 Re: Proposal: Global Index for PostgreSQL
Previous Message Daniil Davydov 2025-07-01 11:56:11 Re: Prevent internal error at concurrent CREATE OR REPLACE FUNCTION