From: | "Zhou, Zhiguo" <zhiguo(dot)zhou(at)intel(dot)com> |
---|---|
To: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Cc: | <tianyou(dot)li(at)intel(dot)com> |
Subject: | Re: Optimize LWLock scalability via ReadBiasedLWLock for heavily-shared locks |
Date: | 2025-07-01 13:30:14 |
Message-ID: | ea71c35c-57c2-497e-814e-2e83c58bcaed@intel.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 6/26/2025 1:07 PM, Zhou, Zhiguo wrote:
> Hi Hackers,
>
> This patch addresses severe LWLock contention observed on high-core systems
> where hundreds of processors concurrently access frequently-shared locks.
> Specifically for ProcArrayLock (exhibiting 93.5% shared-mode acquires), we
> implement a new ReadBiasedLWLock mechanism to eliminate the atomic
> operation
> bottleneck.
>
> Key aspects:
> 1. Problem: Previous optimizations[1] left LWLockAttemptLock/Release
> consuming
> ~25% total CPU cycles on 384-vCPU systems due to contention on a single
> lock-state cache line. Shared lock attempts showed 37x higher
> cumulative
> latency than exclusive mode for ProcArrayLock.
>
> 2. Solution: ReadBiasedLWLock partitions lock state across 16 cache lines
> (READ_BIASED_LOCK_STATE_COUNT):
> - Readers acquire/release only their designated LWLock (indexed by
> pid % 16) using a single atomic operation
> - Writers pay higher cost by acquiring all 16 sub-locks exclusively
> - Maintains LWLock's "acquiring process must release" semantics
>
> 3. Performance: HammerDB/TPCC shows 35.3% NOPM improvement over baseline
> - Lock acquisition CPU cycles reduced from 16.7% to 7.4%
> - Lock release cycles reduced from 7.9% to 2.2%
>
> 4. Implementation:
> - Core infrastructure for ReadBiasedLWLock
> - ProcArrayLock converted as proof-of-concept
> - Maintains full LWLock API compatibility
>
> Known considerations:
> - Increased writer acquisition cost (acceptable given rarity of exclusive
> acquisitions for biased locks like ProcArrayLock)
> - Memory overhead: 16x size increase per converted lock
> - Currently validated for ProcArrayLock; other heavily-shared locks may be
> candidates after further analysis
>
> This is a preliminary version for community feedback. We're actively:
> 1. Refining the implementation details
> 2. Expanding test coverage
> 3. Investigating additional lock candidates
> 4. Optimizing writer-fast-path opportunities
>
> Test results, profiling data, and design details can be shared upon
> request.
> We appreciate all comments and suggestions for improvement.
>
> [1]Optimize shared LWLock acquisition for high-core-count systems:
> https://www.postgresql.org/message-id/flat/73d53acf-4f66-41df-
> b438-5c2e6115d4de%40intel.com
>
> Regards,
>
> Zhiguo
As a follow-up to previous mail, the proposed patch has undergone
further refinement by being logically split into two distinct patches
for clearer review and testing: the first introduces the new
ReadBiasedLWLock mechanism, and the second converts ProcArrayLock
specifically from an LWLock to utilize this new ReadBiasedLWLock type.
This patchset has successfully passed the standard make-check regression
tests. We are now eager to submit these updated patches for community
review and welcome any feedback or discussion on the implementation
approach. Thanks!
Regards,
Zhiguo
Attachment | Content-Type | Size |
---|---|---|
v1-0001-Introduce-ReadBiasedLWLock-for-high-concurrency-read.patch | text/plain | 20.2 KB |
v1-0002-Convert-ProcArrayLock-to-ReadBiasedLWLock-for-improv.patch | text/plain | 35.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Langote | 2025-07-01 13:42:06 | Re: Proposal: Global Index for PostgreSQL |
Previous Message | Daniil Davydov | 2025-07-01 11:56:11 | Re: Prevent internal error at concurrent CREATE OR REPLACE FUNCTION |