Re: Immediate shutdown and system(3)

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Immediate shutdown and system(3)
Date: 2009-03-04 11:01:34
Message-ID: 49AE5F8E.9010403@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Fujii Masao wrote:
> Hi,
>
> On Mon, Mar 2, 2009 at 4:59 PM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> Fujii Masao wrote:
>>> On Fri, Feb 27, 2009 at 6:52 PM, Heikki Linnakangas
>>> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>>> I'm leaning towards option 3, but I wonder if anyone sees a better
>>>> solution.
>>> 4. Use the shared memory to tell the startup process about the shutdown
>>> state.
>>> When a shutdown signal arrives, postmaster sets the corresponding shutdown
>>> state to the shared memory before signaling to the child processes. The
>>> startup
>>> process check the shutdown state whenever executing system(), and
>>> determine
>>> how to exit according to that state. This solution doesn't change any
>>> existing
>>> behavior of pg_standby. What is your opinion?
>> That would only solve the problem for pg_standby. Other programs you might
>> use as a restore_command or archive_command like "cp" or "rsync" would still
>> core dump on the SIGQUIT.
>
> Right. I've just understood your intention. I also agree with option 3 if nobody
> complains about lack of backward compatibility of pg_standby. If no, how about
> using SIGUSR2 instead of SIGINT for immediate shutdown of only the archiver
> and the startup process. SIGUSR2 by default terminates the process.
> The archiver already uses SIGUSR2 for pgarch_waken_stop, so we need to
> reassign that function to another signal (SIGINT is suitable, I think).
> This solution doesn't need signal multiplexing. Thought?

Hmm, the startup/archiver process would then in turn need to kill the
external command with SIGINT. I guess that would work.

There's a problem with my idea of just using SIGINT instead of SIGQUIT.
Some (arguably bad-behaving) programs trap SIGINT and exit() with a
return code. The startup process won't recognize that as "killed by
signal", and we're back to same problem we have with pg_standby that the
startup process doesn't die but continues with the startup. Notably
rsync seems to behave like that.

BTW, searching the archive, I found this long thread about this same issue:

http://archives.postgresql.org/pgsql-hackers/2006-11/msg00406.php

The idea of SIGUSR2 was mentioned there as well, as well as the idea of
reimplementing system(3). The conclusion of that thread was the usage of
setsid() and process groups, to ensure that the SIGQUIT is delivered to
the archive/recovery_command.

I'm starting to feel that this is getting too complicated. Maybe we
should just fix pg_standby to not trap SIGQUIT, and live with the core
dumps...

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Emmanuel Cecchet 2009-03-04 12:34:09 Re: Regclass and quoted table names
Previous Message Guillaume Smet 2009-03-04 10:57:17 Re: BUG #4689: Expanding the length of a VARCHAR column should not induce a table rewrite