Re: [RFC PATCH v0 0/7] Add EXPLAIN ANALYZE wait event reporting

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Ilmar Yunusov <tanswis42(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [RFC PATCH v0 0/7] Add EXPLAIN ANALYZE wait event reporting
Date: 2026-06-22 21:50:49
Message-ID: uah2s5tppv3onn7bsf2uelyexfrxwrmye6qqyrbbsjepxny7l5@guymflaarnsr
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2026-06-22 17:34:23 -0400, Robert Haas wrote:
> > - Is the disabled hot-path overhead of checking an exported boolean in
> > pgstat_report_wait_start/end acceptable?
>
> I'm extremely skeptical. I'm really not keen to add ANY cycles to
> pgstat_report_wait_start/end(). But the even bigger problem is that
> turning this on will result in measuring the time a LOT more times
> than we do at present, and that has significant overhead that can
> distort the results. That is a problem with EXPLAIN ANALYZE, and I
> think it could be a hugely crippling problem for EXPLAIN WAITS,
> because a single ExecProcNode call could result in many, many I/Os or
> LWLock operations.

+1.

I'm just about dead set adding even a single cycle to wait events. If anything
we need to work to make wait events cheaper, not the opposite. Small
overheads really can show up, that's what lead me to optimize them before,
c.f.

commit 225a22b19ed2960acc8e9c0b7ae53e0e5b0eac87
Author: Andres Freund <andres(at)anarazel(dot)de>
Date: 2021-04-03 11:44:47 -0700

Improve efficiency of wait event reporting, remove proc.h dependency.

Several of the callers for wait events are register starved, adding more
register pressure by checking a GUC or whatnot can cause stack spills etc.

> I have always felt that any kind of wait event statistics should use a
> sampling approach.

+1.

> I'd expect a successful patch in this area to work by using a time interrupt
> for self-sampling.

I guess I was thinking of sampling all processes from one measuring process,
because that would allow you a higher sampling rate without distorting the
measured processes as much (the timer interrupt would cause some syscalls to
return with EAGAIN and such). But I'm not sure it matters that much.

> I doubt that counting the number of times that a wait event occurs
> will tell us anything we want to know.

Matches my experience.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Previous Message Robert Haas 2026-06-22 21:34:23 Re: [RFC PATCH v0 0/7] Add EXPLAIN ANALYZE wait event reporting