Re: Immediate shutdown and system(3)

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Immediate shutdown and system(3)
Date: 2009-03-18 19:33:19
Message-ID: 49C14C7F.1060306@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Ok, I've committed a minimal patch to pg_standby in CVS HEAD and
REL8_3_STABLE to not interpret SIGQUIT as a signal for failover. I added
a signal handler for SIGUSR1 to trigger failover; that should be
considered the preferred signal for that, even though SIGINT still works
too.

SIGQUIT is trapped to just die immediately, but without core dumping. As
we still use SIGQUIT for immediate shutdown, any other archive_command
or restore_command will still receive SIGQUIT on immediate shutdown, and
by default dump core. Let's just live with that for now..

This should be mentioned in release notes, as any script that might be
using SIGQUIT at the moment needs to be changed to use SIGUSR1 or SIGINT
instead. Where should I make a note of that so that we don't forget?

Heikki Linnakangas wrote:
> Fujii Masao wrote:
>> Hi,
>>
>> On Mon, Mar 2, 2009 at 4:59 PM, Heikki Linnakangas
>> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>> Fujii Masao wrote:
>>>> On Fri, Feb 27, 2009 at 6:52 PM, Heikki Linnakangas
>>>> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>>>> I'm leaning towards option 3, but I wonder if anyone sees a better
>>>>> solution.
>>>> 4. Use the shared memory to tell the startup process about the shutdown
>>>> state.
>>>> When a shutdown signal arrives, postmaster sets the corresponding
>>>> shutdown
>>>> state to the shared memory before signaling to the child processes. The
>>>> startup
>>>> process check the shutdown state whenever executing system(), and
>>>> determine
>>>> how to exit according to that state. This solution doesn't change any
>>>> existing
>>>> behavior of pg_standby. What is your opinion?
>>> That would only solve the problem for pg_standby. Other programs you
>>> might
>>> use as a restore_command or archive_command like "cp" or "rsync"
>>> would still
>>> core dump on the SIGQUIT.
>>
>> Right. I've just understood your intention. I also agree with option 3
>> if nobody
>> complains about lack of backward compatibility of pg_standby. If no,
>> how about
>> using SIGUSR2 instead of SIGINT for immediate shutdown of only the
>> archiver
>> and the startup process. SIGUSR2 by default terminates the process.
>> The archiver already uses SIGUSR2 for pgarch_waken_stop, so we need to
>> reassign that function to another signal (SIGINT is suitable, I think).
>> This solution doesn't need signal multiplexing. Thought?
>
> Hmm, the startup/archiver process would then in turn need to kill the
> external command with SIGINT. I guess that would work.
>
> There's a problem with my idea of just using SIGINT instead of SIGQUIT.
> Some (arguably bad-behaving) programs trap SIGINT and exit() with a
> return code. The startup process won't recognize that as "killed by
> signal", and we're back to same problem we have with pg_standby that the
> startup process doesn't die but continues with the startup. Notably
> rsync seems to behave like that.
>
> BTW, searching the archive, I found this long thread about this same issue:
>
> http://archives.postgresql.org/pgsql-hackers/2006-11/msg00406.php
>
> The idea of SIGUSR2 was mentioned there as well, as well as the idea of
> reimplementing system(3). The conclusion of that thread was the usage of
> setsid() and process groups, to ensure that the SIGQUIT is delivered to
> the archive/recovery_command.
>
> I'm starting to feel that this is getting too complicated. Maybe we
> should just fix pg_standby to not trap SIGQUIT, and live with the core
> dumps...

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2009-03-18 19:45:52 Re: Solaris getopt_long and PostgreSQL
Previous Message Teodor Sigaev 2009-03-18 16:57:57 Re: Review: B-Tree emulation for GIN