Re: Fix pg_stat_get_backend_wait_event() for aux processes

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Sami Imseih <samimseih(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fix pg_stat_get_backend_wait_event() for aux processes
Date: 2026-02-06 13:59:32
Message-ID: ad3852ed-daaa-43d7-a572-0e465666d1bb@iki.fi
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 05/02/2026 22:43, Sami Imseih wrote:
>>>> There is also a discussion [0] about wait event/activity field
>>>> inconsistency
>>>> with pg_stat_activity with a repro in [1].
>>>
>>>
>>> The repro I was referring to in [1] is actually
>>> https://www.postgresql.org/message-id/ab1c0a7d-e789-5ef5-1180-42708ac6fe2d%40postgrespro.ru
>>
>> That is inherent. The wait event is updated in an unsynchronized fashion. As
>> noted in that thread.
>>
>> Making it synchronized (via st_changecount) would make wait event overhead
>> vastly higher.
>
> Correct, and I don't think we should pay the overhead of ensuring
> strict synchronization.
>
> The patch that Heikkei is proposing [0] will update wait_event_info bypassing
> st_changecount.
>
> ```
> +++ b/src/include/utils/backend_status.h
> @@ -174,6 +174,15 @@ typedef struct PgBackendStatus
>
> /* plan identifier, optionally computed using planner_hook */
> int64 st_plan_id;
> +
> + /*
> + * Proc's wait information. This *not* protected by the changecount
> + * mechanism, because reading and writing an uint32 is assumed
> to atomic.
> + * This is updated very frequently, so we want to keep the overhead as
> + * small as possible.
> + */
> + uint32 st_wait_event_info;
> +
> ```
>
> Which is the same assumption we are already making.
>
> ```
> static inline void
> pgstat_report_wait_start(uint32 wait_event_info)
> {
> /*
> * Since this is a four-byte field which is always read and written as
> * four-bytes, updates are atomic.
> */
> *(volatile uint32 *) my_wait_event_info = wait_event_info;
> }
> ```
>
> So without st_changecount, wait_events could still be out of sync on with
> other backend status fields (i.e. state), but at least my testing shows
> better results with [0] applied. Better results here are no samples
> with " state = active, wait_event = ClientRead and backend_type =
> client backend"

Right, there's some inherent fuzziness if st_changecount is not updated.
My patch nevertheless improves things:

1. If you call pg_stat_get_backend_wait_event_type(1234), and then
pg_stat_get_backend_wait_event(1234), the results will be consistent.
Currently, it's possible to get a nonsensical combination like "LWLock"
and "ClientRead".

2. Because all other status updates bump st_changecount, and the
function that *reads* the fields checks st_changecount, you do actually
get some consistency between wait_event_info and the other fields. I
haven't fully though through what the guarantee is, but I think it's now
impossible to get a combination like "state=active,
wait_event=ClientRead", as you saw in your testing. This consistency
break down if we add more fields that don't update st_changecount, i.e.
you might get inconsistent reads between such fields, but it's an
improvement.

I'm not sure how much we want to document and guarantee these things.
I'd love to keep some flexibility to relax them in the future for
performance reasons. But I think this patch is an improvement, and
doesn't promise too much.

> Using the attached repro.sh (attached):

Nice, thanks for the testing!

- Heikki

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2026-02-06 14:12:20 Re: [PING] fallocate() causes btrfs to never compress postgresql files
Previous Message Nazir Bilal Yavuz 2026-02-06 13:51:17 Re: Speed up COPY FROM text/CSV parsing using SIMD