Re: Unportable implementation of background worker start

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Rémi Zara <remi_zara(at)mac(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, cm(at)enterprisedb(dot)com, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Unportable implementation of background worker start
Date: 2017-04-25 15:57:30
Message-ID: 19167.1493135850@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

=?utf-8?Q?R=C3=A9mi_Zara?= <remi_zara(at)mac(dot)com> writes:
>> Le 25 avr. 2017 à 01:47, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> a écrit :
>> It looks like coypu is going to need manual intervention (ie, kill -9
>> on the leftover postmaster) to get unwedged :-(. That's particularly
>> disturbing because it implies that ServerLoop isn't iterating at all;
>> otherwise, it'd have noticed by now that the buildfarm script deleted
>> its data directory out from under it.

> coypu was not stuck (no buildfarm related process running), but failed to clean-up shared memory and semaphores.
> I’ve done the clean-up.

Huh, that's even more interesting.

Looking at the code, what ServerLoop actually does when it notices that
the postmaster.pid file has been removed is

kill(MyProcPid, SIGQUIT);

So if our hypothesis is that pselect() failed to unblock signals,
then failure to quit is easily explained: the postmaster never
received/acted on its own signal. But that should have left you
with a running postmaster holding the shared memory and semaphores.
Seems like if it is gone but it failed to remove those, somebody must've
kill -9'd it ... but who? I see nothing in the buildfarm script that
would.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2017-04-25 16:08:13 Re: Quorum commit for multiple synchronous replication.
Previous Message Robert Haas 2017-04-25 15:42:56 Re: pgbench tap tests & minor fixes