Re: SIGQUIT on archiver child processes maybe not such a hot idea?

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: michael(at)paquier(dot)xyz
Cc: tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: SIGQUIT on archiver child processes maybe not such a hot idea?
Date: 2019-09-02 09:31:53
Message-ID: 20190902.183153.120412900.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Mon, 2 Sep 2019 15:51:34 +0900, Michael Paquier <michael(at)paquier(dot)xyz> wrote in <20190902065134(dot)GE1841(at)paquier(dot)xyz>
> On Mon, Sep 02, 2019 at 12:27:09AM +0000, Tsunakawa, Takayuki wrote:
> > From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
> >> After investigation, the mechanism that's causing that is that the
> >> src/test/recovery/t/010_logical_decoding_timelines.pl test shuts
> >> down its replica server with a mode-immediate stop, which causes
> >> that postmaster to shut down all its children with SIGQUIT, and
> >> in particular that signal propagates to a "cp" command that the
> >> archiver process is executing. The "cp" is unsurprisingly running
> >> with default SIGQUIT handling, which per the signal man page
> >> includes dumping core.
> >
> > We've experienced this (core dump in the data directory by an
> > archive command) years ago. Related to this, the example of using
> > cp in the PostgreSQL manual is misleading, because cp doesn't
> > reliably persist the WAL archive file.
>
> The previous talks about having pg_copy are still where they were a
> couple of years ago as we did not agree on which semantics it should
> have. If we could move forward with that and update the documentation
> from its insanity that would be great and... The signal handling is
> something else we could customize in a more favorable way with the
> archiver. Anyway, switching from something else than SIGQUIT to stop
> the archiver will not prevent any other tools from generating core
> dumps with this other signal.

Since we are allowing OPs to use arbitrary command as
archive_command, providing a replacement with non-standard signal
handling for a specific command doesn't seem a general solution
to me. Couldn't we have pg_system(a tentative name), which
intercepts SIGQUIT then sends SIGINT to children? Might be need
to resend SIGQUIT after some interval, though..

> > We enable the core dump in production to help the investigation just in case.
>
> So do I in some of the stuff I work on.
>
> > some_command also catches SIGQUIT just exit. It copies and syncs the file.
> >
> > I proposed something in this line as below, but I couldn't respond to Peter's review comments due to other tasks. Does anyone think it's worth resuming this?
> >
> > https://www.postgresql.org/message-id/7E37040CF3804EA5B018D7A022822984@maumau
>
> And I was looking for this thread a couple of lines ago :)
> Thanks.

# Is there any means to view the whole of a thread from archive?
# I'm a kind of reluctant to wander among messages like a rat in
# a maze:p

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2019-09-02 09:32:59 Re: [HACKERS] CLUSTER command progress monitor
Previous Message Michael Paquier 2019-09-02 08:38:56 Re: pg_basebackup -F t fails when fsync spends more time than tcp_user_timeout