Re: What is happening on buildfarm member crake?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: What is happening on buildfarm member crake?
Date: 2014-01-31 23:15:49
Message-ID: 5346.1391210149@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Sat, Jan 25, 2014 at 5:04 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Yeah. If Robert's diagnosis is correct, and it sounds pretty plausible,
>> then this is really just one instance of a bug that's probably pretty
>> widespread in our signal handlers. Somebody needs to go through 'em
>> all and look for touches of shared memory.

> I haven't made a comprehensive study of every signal handler we have,
> but looking at procsignal_sigusr1_handler, the list of functions that
> can get called from here is quite short: CheckProcSignal(),
> RecoveryConflictInterrupt(), SetLatch(), and latch_sigusr1_handler().

You forgot HandleCatchupInterrupt and HandleNotifyInterrupt, but those
are also safe because they do nothing once proc_exit_inprogress is set.

> I think maybe the best fix is to *clear* MyProc in
> ProcKill/AuxiliaryProcKill and MyProcSignalSlot in
> CleanupProcSignalState, as shown in the attached patch.

This looks good to me in principle. A couple minor beefs:

* The addition to CleanupProcSignalState could use a comment,
similar to the one you added in ProcKill.

* I think the code in ProcKill and AuxiliaryProcKill might be more
readable if the new local variable was named "myproc" (lower case).

> and we can easily add a NULL guard to the SetLatch() call in
> procsignal_sigusr1_handler, which the attached patch also does.

Um ... no such change actually visible in patch, but it's clearly
necessary.

> This might not be a complete fix to every problem of this type that
> exists anywhere in our code, but I think it's enough to make the world
> safe for procsignal_sigusr1_handler.

Yeah; at the least this should cut down on the buildfarm noise we
are seeing ATM.

> Assuming nobody objects too much to this basic approach, should I
> back-patch the parts of this that apply pre-9.4?

Yes, I think so. We have seen some reports of irreproducible crashes
at process exit, and maybe this explains them.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2014-01-31 23:32:03 Re: LDAP: bugfix and deprecated OpenLDAP API
Previous Message Tom Lane 2014-01-31 23:01:38 Re: WITH ORDINALITY planner improvements