Re: System shutdown signal on Windows (was Re: [GENERAL])

From: Krystian Bigaj <krystian(dot)bigaj(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kalai R <softlinne(dot)kv(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: System shutdown signal on Windows (was Re: [GENERAL])
Date: 2014-07-23 22:29:41
Message-ID: CAN=kAeGOEmfh_+vdg+2oD=9KhWzGn4NNqfZNdHoQ_2QOsHhuLQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On 23 July 2014 22:16, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Krystian Bigaj <krystian(dot)bigaj(at)gmail(dot)com> writes:
> > - when pg_console_handler receives CTRL_SHUTDOWN_EVENT from OS, then it
> > calls pg_queue_signal(SIGINT).
>
> > Problems:
> > - when OS is in shutdown path, then it sends CTRL_SHUTDOWN_EVENT, and
> *all*
> > Postgres processes (main and sub/forked) will call
> pg_queue_signal(SIGINT)
> > - so main and sub processes will start to shutdown independently? Can
> this
> > have any bad consequences?
>
> Hm. We ought to have that sending SIGTERM instead, so as to mimic the
> situation when Unix "init" is trying to shut down the system. It might be
> that SIGINT will more or less work, but the postmaster logic is designed
> to work with global SIGTERM as being the clean-up-ASAP trigger. As an
> example, backends servicing remote applications (which will *not* have
> already gotten killed) would not exit in response to SIGINT.
>
> > I think that CTRL_SHUTDOWN_EVENT should be removed from
> pg_console_handler,
>
> That does not sound like a good idea, at least not if Windows has the
> same behavior as "init" does of proceeding to hard kills after some
> grace period.
>
> regards, tom lane
>

I'm not really familiar with Unix and it's SIG-commands. I know only
about SIGINT/SIGTERM from Postgres documentation.

However form what I see is that when Postgress is running by pg_ctl from
service, then it will receive SIGINT (independently and in general in
unspecified order)
- *each* postgres.exe process will queue itself SIGINT (because of
CTRL_SHUTDOWN_EVENT),
- pg_ctl will send SIGINT to main postmaster process (and possibly it will
pass that command to sub-processes)
So there are two independent paths where SIGINT are sent, and pg_ctl
doesn't have really a control when postgres.exe receives SIGINT.
This CTRL_SHUTDOWN_EVENT is not used when postgres.exe is run on *user
session* - so removing it won't change anything.

I see only two cases where CTRL_SHUTDOWN_EVENT might be need (all of there
where postgres.exe is run on service session):
- postgres.exe run by pg_ctl.exe, but pg_ctl service process was
terminated/killed, and then system was shutdown
- someone starts postgres.exe from own service, but doesn't send
SIGINT/SIGTERM command to postgres.exe on service system shutdown (but he
must for service stop)
As I previously wrote, I have workaround for it, so if you think that this
change would break compatibility and don't want to change it, then I'm
really fine with it.

However I've probably found something with pg_ctl.c regarding shutdown and
maybe that suspicious postgres.exe process termination on Windows.

1) shutdownEvent is signaled in pgwin32_ServiceMain
by SERVICE_CONTROL_STOP/SERVICE_CONTROL_SHUTDOWN in pgwin32_ServiceHandler
There is dwWaitHint = 10000.

2)
...
/* Wait for quit... */
ret = WaitForMultipleObjects(2, shutdownHandles, FALSE, INFINITE);

pgwin32_SetServiceStatus(SERVICE_STOP_PENDING);
switch (ret)
{
case WAIT_OBJECT_0: /* shutdown event */
kill(postmasterPID, SIGINT);

/*
* Increment the checkpoint and try again Abort after 12
* checkpoints as the postmaster has probably hung
*/
while (WaitForSingleObject(postmasterProcess, 5000) == WAIT_TIMEOUT &&
status.dwCheckPoint < 12)
status.dwCheckPoint++; <---- missing call
to pgwin32_SetServiceStatus(SERVICE_STOP_PENDING)
or SetServiceStatus(hStatus, (LPSERVICE_STATUS) &status);
break;
...

There is incremented dwCheckPoint every 5000ms, but that status is not
updated (missing pgwin32_SetServiceStatus/SetServiceStatus), so SCM after
10s (dwWaitHint = 10000) will not receive incremented dwCheckPoint, and
it's allowed to kill that process (because this service didn't respond with
dwWaitHint). It kills pg_ctl.exe, but when all services stops, then it
simply terminates other remaining processes - in this case postgres.exe

http://msdn.microsoft.com/en-us/library/windows/desktop/ms685996(v=vs.85).aspx
dwWaitHint:
...
If the amount of time specified by dwWaitHint passes, and dwCheckPoint has
not been incremented or dwCurrentState has not changed, the service control
manager or service control program can assume that an error has occurred
and the service should be stopped.
...

So if main postgres.exe (run by pg_ctl.exe service process) won't shutdown
in 10s (for any reason) it might be terminated/killed by Windows/SCM.

Best regards,
Krystian Bigaj

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message John R Pierce 2014-07-23 22:45:42 Re: Re: Why is unique constraint needed for upsert? (treat atomicity as optional)
Previous Message Seamus Abshere 2014-07-23 22:29:04 Re: Re: Why is unique constraint needed for upsert? (treat atomicity as optional)

Browse pgsql-hackers by date

  From Date Subject
Next Message David G Johnston 2014-07-23 22:48:47 Re: Making joins involving ctid work for the benefit of UPSERT
Previous Message Robert Haas 2014-07-23 22:27:48 Re: Making joins involving ctid work for the benefit of UPSERT