From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | "Zhou, Zhiguo" <zhiguo(dot)zhou(at)intel(dot)com> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Optimize LWLock scalability via ReadBiasedLWLock for heavily-shared locks |
Date: | 2025-07-01 13:57:18 |
Message-ID: | mht7filv3ozry2buhgwsjkxjex2gfqug557ylkhihwhhl3zxbp@wujbfur3g7yj |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2025-06-26 13:07:49 +0800, Zhou, Zhiguo wrote:
> This patch addresses severe LWLock contention observed on high-core systems
> where hundreds of processors concurrently access frequently-shared locks.
> Specifically for ProcArrayLock (exhibiting 93.5% shared-mode acquires), we
> implement a new ReadBiasedLWLock mechanism to eliminate the atomic operation
> bottleneck.
>
> Key aspects:
> 1. Problem: Previous optimizations[1] left LWLockAttemptLock/Release
> consuming
> ~25% total CPU cycles on 384-vCPU systems due to contention on a single
> lock-state cache line. Shared lock attempts showed 37x higher cumulative
> latency than exclusive mode for ProcArrayLock.
>
> 2. Solution: ReadBiasedLWLock partitions lock state across 16 cache lines
> (READ_BIASED_LOCK_STATE_COUNT):
> - Readers acquire/release only their designated LWLock (indexed by
> pid % 16) using a single atomic operation
> - Writers pay higher cost by acquiring all 16 sub-locks exclusively
> - Maintains LWLock's "acquiring process must release" semantics
>
> 3. Performance: HammerDB/TPCC shows 35.3% NOPM improvement over baseline
> - Lock acquisition CPU cycles reduced from 16.7% to 7.4%
> - Lock release cycles reduced from 7.9% to 2.2%
>
> 4. Implementation:
> - Core infrastructure for ReadBiasedLWLock
> - ProcArrayLock converted as proof-of-concept
> - Maintains full LWLock API compatibility
>
> Known considerations:
> - Increased writer acquisition cost (acceptable given rarity of exclusive
> acquisitions for biased locks like ProcArrayLock)
Unfortunately I have a very hard time believing that that's unacceptable -
there are plenty workloads (many write intensive ones) where exclusive locks
on ProcArrayLock are the bottleneck.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Japin Li | 2025-07-01 14:00:22 | Re: Inconsistent LSN format in pg_waldump output |
Previous Message | Bertrand Drouvot | 2025-07-01 13:45:55 | Re: Add os_page_num to pg_buffercache |