Re: BUG #15036: Un-killable queries Hanging in BgWorkerShutdown

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: djk447(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15036: Un-killable queries Hanging in BgWorkerShutdown
Date: 2018-01-30 02:06:47
Message-ID: CAEepm=0YQbc32PVbM8BxXDJhmK8+rUTzKhSVC1ujSQ7c1hy5Lw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Tue, Jan 30, 2018 at 5:48 AM, PG Bug reporting form
<noreply(at)postgresql(dot)org> wrote:
> The following bug has been logged on the website:
>
> Bug reference: 15036
> Logged by: David Kohn
> Email address: djk447(at)gmail(dot)com
> PostgreSQL version: 10.1
> Operating system: Ubuntu 16.04
> Description:
>
> I have been experiencing a consistent problem with queries that I cannot
> kill with pg_cancel_backend or pg_terminate_backend. In many cases they have
> been running for days and are in a transaction so it eventually causes
> rather large bloat etc problems. All the backends are in the IPC wait_event.
> The backends appear to either be a main client_backend, in which case
> wait_event_type fields in pg_stat_activity say BgWorkerShutdown and for the
> background workers I see two (though I'm not sure that that this is all of
> them): BtreePage and MessageQueuePutMessage. I'm quite sure the clients for
> these are dead, they had statement timeouts set to an hour at most, they
> might have died sooner than that of other causes. I assume this is a bug and
> I should be reporting it here, but if I'm putting it on the wrong list let
> me know and I'll move it!

Hi David,

Thanks for the report! Based on the mention of BtreePage, this sounds
like the following bug:

https://www.postgresql.org/message-id/flat/CAEepm%3D2xZUcOGP9V0O_G0%3D2P2wwXwPrkF%3DupWTCJSisUxMnuSg%40mail.gmail.com

The fix for that will be in 10.2 (current target date: February 8th).
The workaround in the meantime would be to disable parallelism, at
least for the queries doing parallel index scans if you can identify
them.

However, I'm not entirely sure why you're not able to cancel these
backends politely with pg_cancel_backend(). For example, the
BtreePage waiter should be in ConditionVariableSleep() and should be
interrupted by such a signal and error out in CHECK_FOR_INTERRUPTS().

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tomas Vondra 2018-01-30 02:17:12 Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop
Previous Message Andres Freund 2018-01-29 23:12:58 Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop