Re: Better client reporting for "immediate stop" shutdowns

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Better client reporting for "immediate stop" shutdowns
Date: 2020-12-22 08:51:28
Message-ID: CABUevEx1Obz9cR7O+RMoHe8QFcG_iC2znof3hJ54oKN4Qvn2Kw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 22, 2020 at 2:29 AM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Tue, Dec 22, 2020 at 3:13 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Up to now, if you shut down the database with "pg_ctl stop -m immediate"
> > then clients get a scary message claiming that something has crashed,
> > because backends can't tell whether the SIGQUIT they got was sent for
> > a crash-and-restart situation or because of an immediate-stop command.
> >
> > This isn't great from a fit-and-finish perspective, and it occurs to me
> > that it's really easy to do better: the postmaster can stick a flag
> > into shared memory explaining the reason for SIGQUIT. While we don't
> > like the postmaster touching shared memory, there doesn't seem to be
> > any need for interlocking or anything like that, so there is no risk
> > involved that's greater than those already taken by pmsignal.c.
>
> +1 to improve the message.
>
> > So, here's a very simple proposed patch. Some issues for possible
> > bikeshedding:
> >
> > * Up to now, pmsignal.c has only been for child-to-postmaster
> > communication, so maybe there is some better place to put the
> > support code. I can't think of one though.
>
> +1 to have it here as we already have the required shared memory
> initialization code to add in new flags there -
> PMSignalState->sigquit_reason.
>
> If I'm correct, quickdie() doesn't access any shared memory because
> one of the reason we can be in quickdie() is when the shared memory
> itself is corrupted(the comment down below on why we don't call
> roc_exit() or atexit() says), in such case, will GetQuitSignalReason()
> have some problem in accessing the shared memory i.e. + return
> PMSignalState->sigquit_reason;?
>
> > * I chose to report the same message for immediate shutdown as we
> > already use for SIGTERM (fast shutdown or pg_terminate_backend()).
> > Should it be different, and if so what?
>
> AFAIK, errmsg(terminating connection due to administrator command") is
> emitted when there's no specific reason. But we know exactly why we
> are terminating in this case, how about having an error message like
> errmsg("terminating connection due to immediate shutdown request")));
> ? There are other places where errmsg("terminating connection due to
> XXXX reasons"); is used. We also log messages when an immediate
> shutdown request is received errmsg("received immediate shutdown
> request").

+1. I definitely think having this message be different can be useful.

See also the thread about tracking shutdown reasons (connection
statistics) -- not the same thing, but the same concepts apply.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Li Japin 2020-12-22 09:07:21 Confused about stream replication protocol documentation
Previous Message Bharath Rupireddy 2020-12-22 08:45:56 Re: Parallel Inserts in CREATE TABLE AS