Re: contrib/pg_stat_tcpinfo

From: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: contrib/pg_stat_tcpinfo
Date: 2025-11-10 08:30:22
Message-ID: CAKZiRmz7=zs+S2Ymo721vuULzfb71WBiaf6ihrjTHds5uqM8WA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Nov 8, 2025 at 12:17 AM Tomas Vondra <tomas(at)vondra(dot)me> wrote:

Hi Tomas, thanks for responding!

> On 11/7/25 11:36, Jakub Wartak wrote:
> > On Mon, Nov 3, 2025 at 3:09 PM Jakub Wartak
> > <jakub(dot)wartak(at)enterprisedb(dot)com> wrote:
> >>
> >> Attached is pg_stat_tcpinfo, an heavy work in progress, Linux-only
> >> netstat/ss-like extension for showing detailed information about TCP
> >> connections based on information from the kernel itself.
> > [..]
> >
>
> I personally don't remember ever needing this kind of visibility into
> TCP connections, but I'm also not doing all that much direct support
> recently. And even in the past I personally didn't need to look at
> TCP-level details all that much, the problems were usually elsewhere.
>
> But maybe it's very useful in practice, don't know.

Well, as stated earlier, such tooling was required each time for
latency problem with SyncRep or bandwidth problem with anything else.

> >> Some early feedback about direction in order to bring this into core
> >> would be appreciated. State of stuff:
> >>
>
> Well, it's an extension in contrib. Is that sufficiently "in core"? Do
> you expect this to be used in PROD environments, or more in DEV?

Contrib/* for some reason sounds to me like being already in core and
good enough (core project "PostgreSQL" not core like "src/").
Technically it's just about having stuff ready to be used quickly when
it's needed (banks, fintech, k8) rather than getting into debates IF
it can be installed, IF it is being supported and so on. Having access
to such more advanced debugging/correlation utilities makes a better
impression about its maturity (time to resolution is lower). It's
almost always about PROD-like systems.

> >> 1. Andres is pushing for supporting UNIX domain sockets here, but I'm
> >> not sure if it is really worth the effort (and it would trigger new
> >> naming problem;)) and primarily making the code even more complex.
> >> IMHO the netlinksock_diag API is already convoluted and adding AF_UNIX
> >> would make it even less readable.
>
> No idea. For most real-world production systems the client is probably
> on a different machine, so won't use UNIX sockets. Not always, but how
> often do UNIX sockets have network-like problems for this visibility to
> be helpful?

Yes, same here, never got into problems with domain sockets
performance, but Andres wanted to have that visibility probably to
test local pgbench/basebackups runs, but in such development-like
scenarios `ss -moe` does not show that many interesting performance
metrics there (it's just skmem):

Netid State Recv-Q Send-Q
Local Address:Port Peer Address:Port Process
u_str ESTAB 0 0
/var/run/postgresql/.s.PGSQL.5432 86934 * 112798
skmem:(r0,rb212992,t0,tb212992,f0,w0,o0,bl0,d0) <-> ino:3532 dev:0/27

So my line of thinking would be to not clutter the code more than
necessary to add something that's already easily available (on PG
devs/commiters laptops), but would be rareley used in the field.

> Also, how much work / extra code would it be to support UNIX sockets?

+150-200 lines more - maybe some generalization of some routines to
handle AF_UNIX and AF_UNIX (and that would potentially pave the road
to also support other sock_diag data, e.g. udp, but I dont want to add
that too)

> >> 3. Biggest TODO left is probably properly formatting the information
> >> based on struct tcpinfo variables (just like ss(1) does, so keeping
> >> the same unit/formatting)
>
> I don't follow? Why do you want to format data the way "ss" does?

Hm, no strong feelings there, I just wanted to have the same
information. I don't mind formatting it any other way. Today in v2
that's OBJ (One Big Jsonb):

postgres=# select pid, tcpinfo->'rtt' as RTT, jsonb_pretty(tcpinfo)
from pg_stat_tcpinfo limit 1;
-[ RECORD 1 ]+---------------------------------------------
pid | 13019
rtt | 23.445
jsonb_pretty | { +
| "ato": 40.000, +
| "rto": 224000, +
| "rtt": 23.445, +
| "lost": 0, +
| "pmtu": 1500, +
| "skmem": { +
| "optmem": 0, +
| "rcvbuf": 1239696, +
| "sndbuf": 87040, +
| "fwd_alloc": 3136, +
| "rmem_alloc": 960, +
| "wmem_alloc": 0, +
| "wmem_queued": 0 +
| }, +
| "state": 8, +
| "timer": "(off,0min0sec,0)", +
[..60 lines more..]

Any better ideas? It's liteally about dumping out struct tcpinfo (and
could be OS-dependent - for far future) - so that's why I've picked up
Json, just to to have this flexibility long term.

> >> 4. Patch/tests are missing intentionally as I would like first to
> >> stabilize the outputs/naming/code first.
> >> 5. [security] Should this be available to pg_monitor/pg_read_all_stats
> >> or just to superuser?
>
> I suppose making this superuser-only would be against the effort to not
> require superuser privileges, so using roles seems like the way to go.
> The nature of the data seems very similar to pg_stat_activity (i.e. info
> about current connections), so I'd suggest to use pg_read_all_stats. It
> might even use an approach similar to pg_stat_get_activity(), which
> shows some fields to everyone, and the role is required only for fields
> with sensitive information.

Thanks.

> >> 6. [security] Should this return info about all TCP connections or
> >> just the UID of the postmaster?
> >
>
> Not sure if I understand the question, but IMHO this should return only
> info about connections opened by Postgres. Not for connections about TCP
> connections opened by other stuff running on the server.

Yes, I'm more on this option too. Ok let's do it that way.

-J.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jakub Wartak 2025-11-10 08:34:58 Re: contrib/pg_stat_tcpinfo
Previous Message Jim Jones 2025-11-10 08:14:49 Re: display hot standby state in psql prompt