Simon Riggs wrote:
> On Fri, 2009-02-06 at 10:06 +0200, Heikki Linnakangas wrote:
>> Simon Riggs wrote:
>>> On Thu, 2009-02-05 at 21:54 +0200, Heikki Linnakangas wrote:
>>>> - If you perform a fast shutdown while startup process is waiting for
>>>> the restore command, startup process sometimes throws a FATAL error
>>>> which leads escalates into an immediate shutdown. That leads to
>>>> different messages in the logs, and skipping of the shutdown
>>>> restartpoint that we now otherwise perform.
>> I think what happens is that if the restore command receives the SIGTERM
>> and dies before the startup process that's waiting for the restore
>> command receives the SIGTERM, the startup process throws a FATAL error
>> because the restore command died unexpectedly. I put this
>>> if (shutdown_requested && InRedo)
>>> /* XXX: Is EndRecPtr always the right value here? */
>> right after the "system(xlogRestoreCmd)" call, to exit gracefully if we
>> were requested to shut down while restore command was running, but it
>> seems that that's not enough because of the race condition.
> Can we trap the death of the restorecmd and handle it differently from
> the death of the startup process?
The startup process launches the restore command, so it's the startup
process that needs to handle its death.
Anyway, I think I've found a solution. While we're executing the restore
command, we're in a state that it's safe to proc_exit(0). We can set a
flag to indicate to the signal handler when we're executing the restore
command, so that the signal handler can do proc_exit(0) on SIGTERM. So
if the startup process receives the SIGTERM first, it will proc_exit(0)
immediately, and if the restore command dies first due to the SIGTERM,
startup process exits with proc_exit(0) when it sees that restore
command exited because of the SIGTERM. If either process receives
SIGTERM for some other reason than a fast shutdown request, postmaster
will see that the startup process exited unexpectedly, and handles that
like a child process crash.
Attached is an updated patch that does that, and I've fixed all the
other outstanding issues I listed earlier as well. Now I'm feeling again
that this is in pretty good shape.
In response to
pgsql-hackers by date
|Next:||From: Andrew Dunstan||Date: 2009-02-09 15:44:17|
|Subject: Re: WIP: fix SET WITHOUT OIDS, add SET WITH OIDS|
|Previous:||From: Mihai Criveti||Date: 2009-02-09 14:41:52|
|Subject: Re: 64 bit PostgreSQL 8.3.6 build on AIX 5300-09-02-0849 with IBM XL C/C++ 10.1.0.1 - initdb fails (could not dump unrecognized node type: 650)|