Quick Links

Optimize LWLock scalability via ReadBiasedLWLock for heavily-shared locks

From:	"Zhou, Zhiguo" <zhiguo(dot)zhou(at)intel(dot)com>
To:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Optimize LWLock scalability via ReadBiasedLWLock for heavily-shared locks
Date:	2025-06-26 05:07:49
Message-ID:	e7d50174-fbf8-4a82-a4cd-1c4018595d1b@intel.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi Hackers,

This patch addresses severe LWLock contention observed on high-core systems
where hundreds of processors concurrently access frequently-shared locks.
Specifically for ProcArrayLock (exhibiting 93.5% shared-mode acquires), we
implement a new ReadBiasedLWLock mechanism to eliminate the atomic operation
bottleneck.

Key aspects:
1. Problem: Previous optimizations[1] left LWLockAttemptLock/Release
consuming
~25% total CPU cycles on 384-vCPU systems due to contention on a single
lock-state cache line. Shared lock attempts showed 37x higher cumulative
latency than exclusive mode for ProcArrayLock.

2. Solution: ReadBiasedLWLock partitions lock state across 16 cache lines
(READ_BIASED_LOCK_STATE_COUNT):
- Readers acquire/release only their designated LWLock (indexed by
pid % 16) using a single atomic operation
- Writers pay higher cost by acquiring all 16 sub-locks exclusively
- Maintains LWLock's "acquiring process must release" semantics

3. Performance: HammerDB/TPCC shows 35.3% NOPM improvement over baseline
- Lock acquisition CPU cycles reduced from 16.7% to 7.4%
- Lock release cycles reduced from 7.9% to 2.2%

4. Implementation:
- Core infrastructure for ReadBiasedLWLock
- ProcArrayLock converted as proof-of-concept
- Maintains full LWLock API compatibility

Known considerations:
- Increased writer acquisition cost (acceptable given rarity of exclusive
acquisitions for biased locks like ProcArrayLock)
- Memory overhead: 16x size increase per converted lock
- Currently validated for ProcArrayLock; other heavily-shared locks may be
candidates after further analysis

This is a preliminary version for community feedback. We're actively:
1. Refining the implementation details
2. Expanding test coverage
3. Investigating additional lock candidates
4. Optimizing writer-fast-path opportunities

Test results, profiling data, and design details can be shared upon request.
We appreciate all comments and suggestions for improvement.

[1]Optimize shared LWLock acquisition for high-core-count systems:
https://www.postgresql.org/message-id/flat/73d53acf-4f66-41df-b438-5c2e6115d4de%40intel.com

Regards,

Zhiguo

Attachment	Content-Type	Size
v0-0001-Optimize-lock-acquisition-release-with-ReadBiased.patch	text/plain	41.4 KB

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Smith	2025-06-26 05:17:00	DOCS: ALTER PUBLICATION - Synopsis for DROP is a bit misleading
Previous Message	Ajit Awekar	2025-06-26 04:15:31	Re: Unnecessary scan from non-overlapping range predicates