Re: Observability in Postgres

From: Greg Stark <stark(at)mit(dot)edu>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Dave Page <dpage(at)pgadmin(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>, David Fetter <david(at)fetter(dot)org>, stark(at)aiven(dot)io
Subject: Re: Observability in Postgres
Date: 2022-02-15 22:23:44
Message-ID: CAM-w4HMQ=Q4KbM8J=4vh0o-kFR-oe8nFRM0hZ4HdM5b8zaEHqg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 15 Feb 2022 at 16:43, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>
> On Tue, Feb 15, 2022 at 1:30 PM Dave Page <dpage(at)pgadmin(dot)org> wrote:
> >
> > - Does it really matter if metrics are exposed on a separate port from the postmaster? I actually think doing that is a good thing as it allows use of alternative listen addresses and firewalling rules; you could then confine the monitoring traffic to a management VLAN for example.
>
> +1. I think it would be much better to keep it on a separate port.
>
> Doesn't even have to be to the point of VLANs or whatever. You just
> want your firewall rules to be able to know what data it's talking
> about.

I would definitely want that to be an option that could be configured.
If you're deploying a server to be accessible as a public service and
configuring firewall rules etc then sure you probably want to be very
explicit about what is listening where.

But when you're deploying databases automatically in a clustered type
environment you really want a service to deploy on a given port and
have the monitoring associated with that port as well. If you deploy
five databases you don't want to have to deal with five other ports
for monitoring and then have to maintain a database of which
monitoring ports are associated with which service ports.... It's
definitely doable -- that's what people do today -- but it's a pain
and it's fragile and it's different at each site which makes it
impossible for dashboards to work out of the box.

> Another part missing in the proposal is how to deal with
> authentication. That'll be an even harder problem if it sits on the
> same port but speaks a different protocol. How would it work with
> pg_hba etc?

Wouldn't it make it easier to work with pg_hba? If incoming
connections are coming through pg_hba then postmaster gets to accept
or refuse the connection based on the host and TLS information. If
it's listening on a separate port then unless that logic is duplicated
it'll be stuck in a parallel world with different security rules.

I'm not actually sure how to make this work. There's a feature in Unix
where a file descriptor can be passed over from one process to another
over a socket but that's gotta be a portability pain. And starting a
new worker for each incoming connection would be a different pain.

So right now I'm kind of guessing this might be just a hook in
postmaster that we can experiment with in the module. The hook would
just return a flag to postmaster saying the connection was handled.

> There's good and bad with it. The bug "good" with it is that it's an
> open standard (openmetrics). I think supporting that would be a very
> good idea. But it would also be good to have a different, "richer",
> format available. Whether it'd be worth to go the full "postgresql
> way" and make it pluggable is questionable, but I would suggest at
> least having both openmetrics and a native/richer one, and not just
> the latter. Being able to just point your existing monitoring system
> at a postgres instance (with auth configured) and have things just
> shows up is in itself a large value. (Then either pluggable or hooks
> beyond that, but having both those as native)

Ideally I would want to provide OpenMetrics data that doesn't break
compatibility with OpenTelemetry -- which I'm still not 100% sure I
understand but I gather that means following certain conventions about
metadata. But those standards only have quantitive metrics, no rich
structured data.

I assume the idea is that that kind of rich structured data belongs in
some other system. But I definitely see people squeezing it into
metrics. For things like replication topology for example.... I would
love to have a

Personally I feel similarly about the inefficiency but I think the
feeling is that compression makes it irrelevant. I suspect there's a
fair amount of burnout over predecessors like SNMP that went to a lot
of trouble to be efficient and implementations were always buggy and
impenetrable as a result. (The predecessor in Google had some features
that made it slightly more efficient too but also made it more
complex. It seems intentional that they didn't carry those over too)

Fwiw one constant source of pain is the insistence on putting
everything into floating point numbers. They have 56 bits of precision
and that leaves us not quite being able to represent an LSN or 64-bit
xid for example.

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2022-02-15 22:37:07 Re: Observability in Postgres
Previous Message Peter Geoghegan 2022-02-15 22:10:47 Re: adding 'zstd' as a compression algorithm