Re: [PATCH] fix wait_event of pg_stat_activity in case of high amount of connections

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, m(dot)zhilin(at)postgrespro(dot)ru, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Sokolov Yura <y(dot)sokolov(at)postgrespro(dot)ru>
Subject: Re: [PATCH] fix wait_event of pg_stat_activity in case of high amount of connections
Date: 2022-07-08 13:44:52
Message-ID: CA+TgmoZXNEXyTnzsukdR5mNtipr0UrQJbUb4ATOfv-Dfit3bbw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 7, 2022 at 10:39 PM Kyotaro Horiguchi
<horikyota(dot)ntt(at)gmail(dot)com> wrote:
> At Thu, 7 Jul 2022 13:58:06 -0500, Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote in
> > I agree that this is a bug, since it can (and did) cause false positives in a
> > monitoring system.
>
> I'm not this is undoubtfully a bug but agree about the rest.

I don't agree that this is a bug, and even if it were, I don't think
this patch can fix it.

Let's start with the second point first: pgstat_report_wait_start()
and pgstat_report_wait_end() change the advertised wait event for a
process, while the backend state is changed by
pgstat_report_activity(). Since those function calls are in different
places, those changes are bound to happen at different times, and
therefore you can observe drift between the two values. Now perhaps
there are some one-directional guarantees: I think we probably always
set the state to idle before we start reading from the client, and
always finish reading from the client before the state ceases to be
idle. But I don't really see how that helps anything, because when you
read those values, you must read one and then the other. If you read
the activity before the wait event, you might see the state before it
goes idle and then the wait event after it's reached ClientRead. If
you read the wait event before the activity, you might see the wait
event as ClientRead, and then by the time you check the activity the
backend might have gotten some data from the client and no longer be
idle. The very best a patch like this can hope to do is narrow the
race condition enough that the discrepancies are observed less
frequently in practice.

And that's why I think this is not a bug fix, or even a good idea.
It's just encouraging people to rely on something which can never be
fully reliable in the way that the original poster is hoping. There
was never any intention of having wait events synchronized with the
pgstat_report_activity() stuff, and I think that's perfectly fine.
Both systems are trying to provide visibility into states that can
change very quickly, and therefore they need to be low-overhead, and
therefore they use very lightweight synchronization, which means that
ephemeral discrepancies are possible by nature. There are plenty of
other examples of that as well. You can't for example query pg_locks
and pg_stat_activity in the same query and expect that all and only
those backends that are apparently waiting for a lock in
pg_stat_activity will have an ungranted lock in pg_locks. It just
doesn't work like that, and there's a very good reason for that:
trying to make all of these introspection facilities behave in
MVCC-like ways would be painful to code and probably end up slowing
the system down substantially.

I think the right fix here is to change nothing in the code, and stop
expecting these things to be perfectly consistent with each other.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-07-08 13:52:46 Re: automatically generating node support functions
Previous Message Tom Lane 2022-07-08 13:43:22 Re: Patch proposal: New hooks in the connection path