| From: | Tatsuya Kawata <kawatatatsuya0913(at)gmail(dot)com> |
|---|---|
| To: | Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | [PATCH] pg_stat_lock: add blocker mode dimension |
| Date: | 2026-05-31 09:59:41 |
| Message-ID: | CAHza6qcu797ENAGFQftdTzYKMHdD7CktaVptc+Ex4xg_4F7HkA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
I have been experimenting with pg_stat_lock and would like to see
*what kind of operation* is causing waits, not just that waits
happened. This patch adds a "mode" dimension so the view can
distinguish, e.g., waits caused by VACUUM (ShareUpdateExclusiveLock)
from waits caused by DDL (AccessExclusiveLock), without parsing
log_lock_waits output.
The holder mode is already computed and emitted by log_lock_waits
via GetLockHoldersAndWaiters(), so aggregating it into pg_stat_lock
seems like a natural fit at low cost.
## Proposal
Add a mode column to pg_stat_lock:
locktype | mode | waits | wait_time |
fastpath_exceeded
---------+--------------------------+-------+-----------+------------------
relation | ShareUpdateExclusiveLock | 312 | 89200 | 0
relation | AccessExclusiveLock | 47 | 238400 | 0
For waits / wait_time, the mode records the *blocking* mode,
captured under the lock partition LWLock when the requester joins
the wait queue:
1. Among modes that conflict with the requester and are currently
held (lock->granted[m] > 0), pick the strongest one.
2. If no held mode conflicts (queue-priority-only wait), pick the
strongest mode in lock->waitMask that conflicts.
Rule (1) takes precedence so the recorded blocker reflects an actual
holder when one exists, falling back to a queued waiter only for
pure queue-priority waits.
For fastpath_exceeded, the mode records the *requested* mode (slot
exhaustion has no blocker); see open question 3 below.
## Use cases
Three operational questions that are awkward today and trivial
with this column:
1. "Is VACUUM impacting user workload?"
-> sum(wait_time) where mode = 'ShareUpdateExclusiveLock'.
2. "Did the last DDL migration cause user-visible waits?"
-> diff of waits/wait_time on mode = 'AccessExclusiveLock'
across the deployment window.
3. "Are SELECTs being blocked indirectly via queue priority?"
-> AccessShare requesters appearing with a stronger blocker mode
indicate they were queued behind a stronger waiter.
## Cost
- Shared memory: +~2.3 kB per cluster (one PgStatShared_Lock
expanded from [locktype] to [locktype][mode]).
- No new instrumentation in the fast path. The blocker-mode
snapshot loop runs only when the request would otherwise wait.
## Alternatives considered
A) requester_mode only. Same storage, simpler implementation,
but the information is largely inferable from locktype plus
context. No operational question seemed to need it.
B) Both requester_mode and blocker_mode. Most informative and
could be added, but the view output becomes much larger.
Users wanting that detail probably reach for pg_wait_sampling
or log_lock_waits already.
This patch proposes "blocker mode only" as a middle ground.
## Open questions
1. Multi-blocker representation.
When several conflicting modes are held simultaneously, the
proposal picks the strongest -- the one whose release the
waiter needs. Other conventions worth considering?
2. Per-row attribution for chained waits.
Each wait event produces one row increment based on the
proximate blocker. When waits are chained, multiple rows are
incremented for what is effectively the same contention:
- TX1 holds AccessShareLock (long SELECT).
- TX2 requests AccessExclusiveLock, queues behind TX1.
- TX3 requests AccessShareLock, queues behind TX2.
Result:
locktype | mode | waits | wait_time
relation | AccessShareLock | 1 | ~90s (TX2)
relation | AccessExclusiveLock | 1 | ~89s (TX3)
Both rows are individually accurate snapshots of "what each
waiter was queued behind", but a naive read of the second row
misses that TX1's long SELECT is the underlying cause. Note
that sum(wait_time) being per-waiter (not wall-clock contention
time) is already true of the current pg_stat_lock; this proposal
just makes it more visible.
Is documenting this contract sufficient, or should the patch
fold chained waits into a single attribution? I lean toward
documenting.
3. Mode column semantics across counters.
The mode column has two interpretations depending on which
counter is non-zero:
- waits / wait_time: the *blocker* mode.
- fastpath_exceeded: the *requested* mode (slot exhaustion
has no blocker).
Both are useful for diagnosis (the latter shows which workload
types exhaust per-backend slots). Carrying two semantics on
one column is admittedly awkward. Alternatives:
a) Document the dual semantic, keep one column.
b) Move fastpath_exceeded to a separate view.
c) NULL out mode for fastpath_exceeded rows, losing per-mode
breakdown.
I lean toward (a); the column name "mode" (rather than
"blocker_mode") was chosen with this in mind.
Draft patch v1 attached. Documentation updates are intentionally omitted
from v1.
Regards,
Tatsuya Kawata
| Attachment | Content-Type | Size |
|---|---|---|
| v1-0001-pg_stat_lock-add-blocker-mode-dimension.patch | application/octet-stream | 16.5 KB |
| From | Date | Subject | |
|---|---|---|---|
| Previous Message | Joel Jacobson | 2026-05-31 09:50:29 | Re: Parallel INSERT SELECT take 2 |