Re: Function to track shmem reinit time

From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Function to track shmem reinit time
Date: 2020-02-22 17:43:20
Message-ID: CAPpHfdsUNrKezRww5Sak2Dd_u9ZV-iPHLRLVRtA9fHfR4K5phw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Feb 22, 2020 at 8:01 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru> writes:
> > From my point of view criticism of this patch was addressed by
> > argument, that pg_shmem_init_time() allows to calculate the server
> > uptime [1]. This is very basic information, which is reasonable to
> > get without log files parsing. It's more than year since [1] is
> > unanswered. So, I'm going to push this if no objections.
>
> I'm still going to object to it, on the grounds that

OK!

> (1) it's exposing an implementation detail that clients should not be
> concerned with, and that we might change in future. The name isn't
> even well chosen --- if the argument is that it's useful to monitor
> server uptime, why isn't the name "pg_server_uptime"?

Choosing a more user-friendly name sounds like good idea for me.
pg_server_uptime() sounds like it should return an interval. I'd like
this function to calculate a diff between timestamps. I would rather
delegate it to user side. What about pg_server_up_since()?

> (2) if your server is crashing often enough that postmaster start
> time isn't an adequate substitute, you have way worse problems than
> whether you can measure it.

Well, it's enough server to crash once per postmaster run to make this
measure absolutely inadequate. We have different reasons for server
crash, not all of them are exact bugs. OOM killer can cause a crash,
and it doesn't seem feasible we can exclude this reason completely.

> I'd rather see us put effort into
> fixing whatever the underlying problem is.

This is monitoring problem vs fixing problem tradeoff. We had similar
for lwlock wait monitoring. Ideally we should make our code never
stuck on lwlock, but that's not feasible. So, we got lwlock wait
monitoring for problem diagnosis. I think now we're discussing
similar issue. Ideally, postgres should never crash. But that's too
hard to achieve. But we can easily get better monitoring on server
crashes and that's useful.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2020-02-22 17:51:16 Re: POC: rational number type (fractions)
Previous Message Tom Lane 2020-02-22 17:01:45 Re: Function to track shmem reinit time