[RFC PATCH v2] Add EXPLAIN ANALYZE wait event reporting

From: r314tive <tanswis42(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Michael Paquier <michael(at)paquier(dot)xyz>
Subject: [RFC PATCH v2] Add EXPLAIN ANALYZE wait event reporting
Date: 2026-05-18 06:11:08
Message-ID: CALCfnurowmckYrg6uQmV_LSTEW+mfuazaTZvZbLLXHSqxumHGg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

This v2 keeps the same RFC feature scope as v1 and changes only regression
test coverage/stability.

v1 thread:
https://www.postgresql.org/message-id/CALCfnuquuxtZmmzQBZ_yxaihfj7bnALXdzi9Nj=RYUW4iwY6GQ@mail.gmail.com

v0 thread:
https://www.postgresql.org/message-id/cover.1778280923.git.tanswis42%40gmail.com

The CFBot FreeBSD run showed that the regression tests assumed a too narrow
statement-level wait list. EXPLAIN WAITS can validly observe additional
statement-level waits around the measured query, for example
parallel-executor
IPC waits or DSM allocation waits. Those are valid observed waits, not an
accounting bug.

Changes in v2:

1. Make the text-output test check for the required Wait Events and
Statement
Wait Events lines, instead of expecting the full statement-level wait
list
to contain only Timeout:PgSleep.
2. Make JSON tests find Timeout:PgSleep by JSONPath instead of assuming it
is
the first wait event array element.
3. Disable debug_parallel_query and default gather workers in the explain
regression test before serial EXPLAIN checks.
4. Disable debug_parallel_query and gather workers in the bitmap runtime-key
attribution test.
5. Remove the plain-regression assertion for rescanned parallel worker wait
aggregation for now. Worker availability and the exact parallel plan
shape
are not deterministic enough for that test under the parallel regression
harness. The accounting behavior is still implemented, but this specific
edge should come back as a more isolated test if we can make it
deterministic
enough for CFBot.

There are no accounting-code changes from v1.

The main RFC questions are unchanged:

- whether the option should be named WAITS or WAIT_EVENTS;
- whether inclusive per-node attribution is the right initial semantics;
- whether the fixed accumulator limit and overflow reporting are acceptable;
- whether the disabled/enabled hot-path overhead is acceptable.

Local verification:

make -C src/test/regress check TESTS=explain
All 245 tests passed.

git diff --check
passed.

Regards,
Ilmar

Attachment Content-Type Size
0001-Add-EXPLAIN-WAITS-statement-reporting.patch application/octet-stream 17.9 KB
0004-Refine-EXPLAIN-WAITS-attribution-semantics.patch application/octet-stream 42.8 KB
0003-Attribute-EXPLAIN-WAITS-to-plan-nodes.patch application/octet-stream 35.0 KB
0002-Aggregate-EXPLAIN-WAITS-from-parallel-workers.patch application/octet-stream 17.0 KB
0005-Harden-EXPLAIN-WAITS-accumulator-handling.patch application/octet-stream 10.6 KB
0006-Hide-EXPLAIN-WAITS-accumulator-internals.patch application/octet-stream 14.5 KB
0007-Keep-EXPLAIN-option-completion-current.patch application/octet-stream 1.2 KB
0008-Stabilize-EXPLAIN-WAITS-regression-tests.patch application/octet-stream 16.4 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ayush Tiwari 2026-05-18 06:26:06 Re: (SQL/PGQ) cache lookup failed for label
Previous Message Peter Smith 2026-05-18 05:52:30 Re: Add missing period to DETAIL messages