Adding wait events statistics

From: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Adding wait events statistics
Date: 2025-06-30 13:36:12
Message-ID: aGKSzFlpQWSh/+2w@ip-10-97-1-34.eu-west-3.compute.internal
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

Wait events are useful to know what backends are waiting for when there is/was
a performance issue: for this we can sample pg_stat_activity at regular intervals
and record historical data. That’s how it is commonly used.

It could also be useful to observe the engine/backends behavior over time and
help answer questions like:

* Is the engine’s wait pattern the same over time?
* Is application’s "A" wait pattern the same over time?
* I observe a peak in wait event "W": is it because "W" is now waiting longer or
is it because it is hit more frequently?
* I observe a peak in some of them (say for example MultiXact%), is it due to a
workload change?

For the above use cases, we need a way to track the wait events that occur between
samples: please find attached a proof of concept patch series doing so.

The patch series is made of:

0001 - It generates the WaitClassTable[], a lookup table that will be used by
the wait events statistics.

The array is indexed by classId (derived from the PG_WAIT_* constants), handles
gaps in the class ID numbering and provides metadata for wait events.

This new array is generated in generate-wait_event_types.pl, so that:

* it now needs lwlocklist.h and wait_classes.h as input parameters
* WAIT_EVENT_CLASS_MASK and WAIT_EVENT_ID_MASK have been moved away from wait_event.c

In passing it adds several new macros that will be used by 0002.

0002 - It adds wait events statistics

It adds a new stat kind PGSTAT_KIND_WAIT_EVENT for the wait event statistics.

This new statistic kind is a fixed one because we know the maximum number of wait
events. Indeed:

* it does not take into account custom wait events as extensions have all they need
at their disposal to create custom stats on their own wait events should they
want to (limited by WAIT_EVENT_CUSTOM_HASH_MAX_SIZE though).

* it does not take into account LWLock > LWTRANCHE_FIRST_USER_DEFINED for the same
reasons as above. That said, there is no maximum limitation in LWLockNewTrancheId().

* we don't want to allocate memory in some places where the counters are
incremented (see 4feba03d8b9). We could still use the same implementation as for
backend statistics (i.e, make use of flush_static_cb) if we really want/need to
switch to variable stats.

For the moment only the counters are added (an array of currently 285 counters),
we’ll study/discuss about adding the timings once the counters are fully done.

I think we’d have more discussion/debate around the timings (should we add them
by default, add a new GUC, enable them at compilation time?…), that’s why only
the counters are in this patch series.

I think it makes sense as the counters have merit on their own. We currently have
273 wait events but the array is 285 long: the reason is that some wait events
classes have "holes".

A few questions:

* Do we need to serialize the stats based on their names (as for
PGSTAT_KIND_REPLSLOT)? This question is the same as "is the ordering preserved
if file stats format is not changed": I think the answer is yes (see f98dbdeb51d)
, which means there is no need for to_serialized_name/from_serialized_name.

* What if a new wait event is added? We'd need to change the stats file format,
unless: the wait event stats kind becomes a variable one or we change a bit the
way fixed stats are written/read to/from the stat file (we could add a new field
in the PgStat_KindInfo for example).

Note: for some backends the wait events stats are not flushed (walwriter for
example), so we need to find additional places to flush the wait events stats.

0003 - It adds the pg_stat_wait_event view

It also adds documentation and regression tests.

0004 - switching PGSTAT_KIND_WAIT_EVENT to variable sized

It might be better for PGSTAT_KIND_WAIT_EVENT to be a variable sized stats kind.
That way:

* It would be easier to add custom wait events if we want to
* It would be possible to add new wait events without having to change the stats
file format

So adding 0004 to see what it would look like to have a variable sized stats kind
instead and decide how we want to move forward.

It uses the uint32 wait_event_info as the hash key while the hash key is defined
as uint64: that should not be an issue but this patch does explicit casting though.

That said it might be better to use all the 64 bits (means not have the half full
of zeroes) for the hash key (better hashing distribution?): we could imagine
using something like:

((uint64) wait_event_info) | (((uint64) wait_event_info) << 32)

for the hash key.

If we decide to go that way (means with variable sized kind) then a new patch
series will be provided and will not implement the fixed one but will start
directly with the variable one.

The more I think about it, the more I think we should go for the variable sized
proposal: that's more flexible.

Remarks:

* If we want to add some "ereport” in waitEventIncrementCounter() then we’d need
to take care of the race condition in ConditionVariableTimedSleep() that that would
produce, see [1].

* Once we agree on fixed vs variable sized stats kind, I'll start measuring if
there is any performance regression and check if there is a need for optimization
(partition the counters array?…).

* The pgstat_report_wait_end() is "inlined", but with the additional code added
here, compilers may ignore the inline keyword. Need to check if that's the case
and the impact (see above).

* After it goes in: add timings, add into per-backend stats too

[1]: https://www.postgresql.org/message-id/aBhuTqNhMN3prcqe%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v1-0001-Generate-the-WaitClassTable.patch text/x-diff 12.2 KB
v1-0002-Add-wait-events-statistics.patch text/x-diff 17.0 KB
v1-0003-Add-the-pg_stat_wait_event-view.patch text/x-diff 11.3 KB
v1-0004-switching-PGSTAT_KIND_WAIT_EVENT-to-variable-size.patch text/x-diff 15.3 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2025-06-30 13:40:31 Re: Parallel heap vacuum
Previous Message Nathan Bossart 2025-06-30 13:31:44 Re: add function for creating/attaching hash table in DSM registry