[GSoC 2026] - Reducing pg_stat_statements LWLock Contention - Introduction

From: Trương Hoàng Quân <truonghoangquan456(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Andrei Lepikhov <lepihov(at)gmail(dot)com>, Kirk Wolak <wolakk(at)gmail(dot)com>, nik(at)postgres(dot)ai, andreas(at)proxel(dot)se
Subject: [GSoC 2026] - Reducing pg_stat_statements LWLock Contention - Introduction
Date: 2026-05-09 03:27:32
Message-ID: CAGJgb9vmBZRWi-8+=62U-BoPvGrUS4uMk=gjGbAfp3Rro2CAJg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Hackers!

I will be working on reducing pg_stat_statements LWLock contention [1] for
GSoC 2026. Over the past several weeks I have been engaging with my
mentors, Kirk, Nik, Andreas and Andrei, and wanted to introduce myself,
share what I have learned so far, and get feedback from the broader
community.

*About me*

I am a sophomore CS student at UMass Amherst, interning at Deep Infra where
I profile and optimize inference engines (vLLM, SGLang, TensorRT-LLM) under
high concurrency. My systems background is mostly C++ through competitive
programming; this project is my path into production C and PostgreSQL
internals.

*The problem*

pg_stat_statements serializes every backend through a single LWLock
(pgss->lock). Structural operations (new entry insertion, deallocation,
reset) require an exclusive lock that blocks the entire cluster. The
deallocation path is the main offender: entry_dealloc() holds the exclusive
lock through an O(n log n) sort over all entries. Borodin's 2022 incident
report [2] is the worst documented case; it froze an entire production
database.

*What I have done*

I built PostgreSQL 18.3 and 19devel from source (--enable-debug
--enable-cassert --enable-injection-points) and benchmarked the contention.
At max=100 with 1000 distinct query forms, 90-100% of active backends block
on LWLock|pg_stat_statements during deallocation churn, with
entry_dealloc() firing ~30,000 times in 60 seconds. PG19 holds the
exclusive lock for less time per acquisition (~40% less overhead than PG18)
but the bottleneck pattern is identical. I also reproduced the same stall
from periodic pg_stat_statements_reset() calls.

Through back-and-forth with the mentors, my initial conditional-skip design
evolved into a pending-entry queue, specifically to avoid the silent data
loss that got Borodin's 2022 patch rejected.

*Proposed approach*

Two core deliverables: (1) a pending-entry queue using
LWLockConditionalAcquire(), with queued/dropped counters in
pg_stat_statements_info for observability; and (2) restructuring
entry_dealloc() to sort outside the exclusive lock, reducing exclusive hold
time from O(n log n) to O(n). Both ship independently. Two stretch goals
are scoped as post-core exploration: lock separation between structural
changes and counter updates, and an optimized reset path.

The full proposal is attached. I would especially welcome feedback on
queue-full behavior under sustained contention and on concurrent dealloc
edge cases during the snapshot-to-eviction window.

I know work like this rarely lands in a single summer. I plan to stay
engaged past August and leave any unfinished patches in shape for others to
carry forward.

-----
[1]
wiki.postgresql.org/wiki/GSoC_2026#Monitoring_Tools_Performance:_pg_stat_statements_and_LWLock_Contention
[2]
postgresql.org/message-id/1AEEB240-9B68-44D5-8A29-8F9FDB22C801@yandex-team.ru

Best regards,
Quan Hoang Truong

Attachment Content-Type Size
LWLock_Contention_Proposal.pdf application/pdf 13.3 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Chao Li 2026-05-09 05:47:22 Re: Fix bug of UPDATE/DELETE FOR PORTION OF with inheritance tables
Previous Message Chao Li 2026-05-09 03:08:28 Re: Bug in ALTER SUBSCRIPTION ... SERVER / ... CONNECTION with broken old server