| From: | r314tive <tanswis42(at)gmail(dot)com> |
|---|---|
| To: | pgsql-hackers(at)postgresql(dot)org |
| Cc: | Michael Paquier <michael(at)paquier(dot)xyz> |
| Subject: | [RFC PATCH v2] Add EXPLAIN ANALYZE wait event reporting |
| Date: | 2026-05-18 06:11:08 |
| Message-ID: | CALCfnurowmckYrg6uQmV_LSTEW+mfuazaTZvZbLLXHSqxumHGg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
This v2 keeps the same RFC feature scope as v1 and changes only regression
test coverage/stability.
v0 thread:
https://www.postgresql.org/message-id/cover.1778280923.git.tanswis42%40gmail.com
The CFBot FreeBSD run showed that the regression tests assumed a too narrow
statement-level wait list. EXPLAIN WAITS can validly observe additional
statement-level waits around the measured query, for example
parallel-executor
IPC waits or DSM allocation waits. Those are valid observed waits, not an
accounting bug.
Changes in v2:
1. Make the text-output test check for the required Wait Events and
Statement
Wait Events lines, instead of expecting the full statement-level wait
list
to contain only Timeout:PgSleep.
2. Make JSON tests find Timeout:PgSleep by JSONPath instead of assuming it
is
the first wait event array element.
3. Disable debug_parallel_query and default gather workers in the explain
regression test before serial EXPLAIN checks.
4. Disable debug_parallel_query and gather workers in the bitmap runtime-key
attribution test.
5. Remove the plain-regression assertion for rescanned parallel worker wait
aggregation for now. Worker availability and the exact parallel plan
shape
are not deterministic enough for that test under the parallel regression
harness. The accounting behavior is still implemented, but this specific
edge should come back as a more isolated test if we can make it
deterministic
enough for CFBot.
There are no accounting-code changes from v1.
The main RFC questions are unchanged:
- whether the option should be named WAITS or WAIT_EVENTS;
- whether inclusive per-node attribution is the right initial semantics;
- whether the fixed accumulator limit and overflow reporting are acceptable;
- whether the disabled/enabled hot-path overhead is acceptable.
Local verification:
make -C src/test/regress check TESTS=explain
All 245 tests passed.
git diff --check
passed.
Regards,
Ilmar
| Attachment | Content-Type | Size |
|---|---|---|
| 0001-Add-EXPLAIN-WAITS-statement-reporting.patch | application/octet-stream | 17.9 KB |
| 0004-Refine-EXPLAIN-WAITS-attribution-semantics.patch | application/octet-stream | 42.8 KB |
| 0003-Attribute-EXPLAIN-WAITS-to-plan-nodes.patch | application/octet-stream | 35.0 KB |
| 0002-Aggregate-EXPLAIN-WAITS-from-parallel-workers.patch | application/octet-stream | 17.0 KB |
| 0005-Harden-EXPLAIN-WAITS-accumulator-handling.patch | application/octet-stream | 10.6 KB |
| 0006-Hide-EXPLAIN-WAITS-accumulator-internals.patch | application/octet-stream | 14.5 KB |
| 0007-Keep-EXPLAIN-option-completion-current.patch | application/octet-stream | 1.2 KB |
| 0008-Stabilize-EXPLAIN-WAITS-regression-tests.patch | application/octet-stream | 16.4 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Ayush Tiwari | 2026-05-18 06:26:06 | Re: (SQL/PGQ) cache lookup failed for label |
| Previous Message | Peter Smith | 2026-05-18 05:52:30 | Re: Add missing period to DETAIL messages |