Re: Addition of authenticated ID to pg_stat_activity

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Stephen Frost <sfrost(at)snowman(dot)net>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Addition of authenticated ID to pg_stat_activity
Date: 2021-05-17 08:28:49
Message-ID: CABUevEzsSRvXuqSzxPVBvovxMSpEhs0tiZDoyj0Eu8SrD=MPcw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 17, 2021 at 6:35 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Thu, Apr 29, 2021 at 04:56:42PM +0200, Magnus Hagander wrote:
> > I definitely use it all the time to monitor autovacuum all the time.
> > The others as well regularly, but autovacuum continuously. I also see
> > a lot of people doing things like "from pg_stat_activity where query
> > like '%mytablename%'" where they'd want both any regular queries and
> > any autovacuums currently processing the table.
>
> When it comes to development work, I also look at things different
> than backend connections, checkpointer and WAL writer included.

While I think we should optimize these primarily for users and not
developers, I definitely do those things as well. In particular wait
events for the background processes.

> > I'd say client address is also pretty common to identify which set of
> > app servers connections are coming in from -- but client port and
> > client hostname are a lot less interesting. But it'd be kind of weird
> > to split those out.
>
> Yes, I agree that it would be confusing to split the client_* fields
> across multiple views.
>
> > For *interactive use* I'd find pretty much all other columns
> > interesting and commonly used. Probably not that interested in the
> > oids of the database and user, but again they are the cheap ones. We
> > could get rid of the joints if we only showed the oids, but in
> > interactive use it's really the names that are interesting. But if
> > we're just trying to save column count, I'd say get rid of datid and
> > usesysid.
> >
> > I'd hold everything else as interesting.
>
> Yes, you have an argument here about the removal of usesysid and
> datid. Now I find joins involving OIDs to be much more natural than
> the object names, because that's the base of what we use in the
> catalogs.

Agreed. And I'm not sure the actual gain is that big if we can just
remove oid columns...

> Not sure if we would be able to agree on something here, but the
> barrier to what a session and a connection hold is thin when it comes
> to roles and application_name. Thinking more about that, I would be
> really tempted to get to do a more straight split with data that's
> associated to a session, to a transaction and to a connection, say:
> 1) pg_stat_session, data that may change.
> - PID
> - leader PID
> - the role name
> - role ID
> - application_name
> - wait_event_type
> - wait_event
> 2) pg_stat_connection, static data associated to a connection.
> - PID
> - database name
> - database OID
> - client_addr
> - client_hostname
> - client_port
> - backend_start
> - authn ID
> - backend_type
> 3) pg_stat_transaction, or pg_stat_activity, for the transactional
> activity.
> - PID
> - xact_start
> - query_start
> - backend_xid
> - state_change
> - query string
> - query ID
> - state

This seems extremely user-unfriendly to me.

I mean. Timestamps are nso split out between different views so you
can't track the process iwthout it. And surely wait_event info is
*extremely* related to things like what query is running and what
state the session is in. And putting backend_type off in a separate
view means most queries are going to have to join that in anyway. Or
include it in all views. And if we're forcing the majority of queries
to join multiple views, what have we actually gained?

Based on your list above, I'd definitely want at least (1) and (2) to
be in the same one, but they'd have to also gain at least the database
oid/name and backend_type, and maybe also backend_start.

So basically, it would be moving out client_*, and authn_id. If we're
doing that then as you say maybe pg_stat_connection is a good name and
could then *also* gain the information that's currently in the ssl and
gss views for a net simplification.

tld;dr; I think we have to be really careful here or the cure is going
to be way worse than the disease.

> Or I could just drop a new function that fetches the authn ID for a
> given PID, even if this makes things potentially less consistent when
> it comes to the lookup of PgBackendStatus, guarantee given now by
> pg_stat_get_activity().

Well, the authnid will never change so I'm not sure the consistency
part is a big problem? Or maybe I'm misunderstanding the risk you're
referring to?

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2021-05-17 08:40:47 Re: Asynchronous Append on postgres_fdw nodes.
Previous Message Magnus Hagander 2021-05-17 08:17:56 Winflex docs and distro