Re: "stuck spinlock"

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Christophe Pettus <xof(at)thebuild(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: "stuck spinlock"
Date: 2013-12-13 18:39:42
Message-ID: CA+TgmoaW7+CJ4kRoCKcEjWTamYuUxt0Ubz_DAvEzqRGCGf0T4g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 13, 2013 at 1:15 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2013-12-13 12:54:09 -0500, Tom Lane wrote:
>> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
>> > I wonder what to do about bgworker's bgworker_die()? I don't really see
>> > how that can be fixed without breaking the API?
>>
>> IMO it should be flushed and bgworkers should use the same die() handler
>> as every other backend, or else one like the one in worker_spi, which just
>> sets a flag for testing later.
>
> Agreed on not going forward like now, but I don't really see how they
> could usefully use die(). I think we should just mandate that every
> bgworker conneced to shared memory registers a sigterm handler - we
> could put a check into BackgroundWorkerUnblockSignals(). We should leave
> the current handler in for unconnected one though...
> bgworkers are supposed to be written as a loop around procLatch, so
> adding a !got_sigterm, probably isn't too hard.

I think the !got_sigterm thing is complete bunk. If a background
worker is running SQL queries, it really ought to honor a query cancel
or sigterm at the next CHECK_FOR_INTERRUPTS(). But the default
background worker handler for SIGUSR1 just sets the process latch, and
worker_spi's sigterm handler just sets a private variable got_sigterm.
So ProcessInterrupts() will never get called, and if it did it
wouldn't do anything anyway. That's really pretty horrible, because it
means that the query worker_spi runs can't be interrupted short of a
SIGQUIT. So I think worker_spi is really a very bad example of how to
do this right. In the as-yet-uncommitted test-shm-mq-v1.patch, I did
this:

+static void
+handle_sigterm(SIGNAL_ARGS)
+{
+ int save_errno = errno;
+
+ if (MyProc)
+ SetLatch(&MyProc->procLatch);
+
+ if (!proc_exit_inprogress)
+ {
+ InterruptPending = true;
+ ProcDiePending = true;
+ }
+
+ errno = save_errno;
+}

...but I'm not 100% sure that's right, either.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2013-12-13 18:40:06 Re: autovacuum_work_mem
Previous Message Robert Haas 2013-12-13 18:32:12 Re: "stuck spinlock"