Re: BUG #14960: Queries stuck on BtreePage lock on parallel index scan

From: Tim Warberg <tlw(at)monsido(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #14960: Queries stuck on BtreePage lock on parallel index scan
Date: 2017-12-11 09:55:45
Message-ID: 6D17214F-A045-4C32-B533-6846CD63132C@monsido.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi Thomas,

Thanks for the quick reply.

> On 11 Dec 2017, at 10.42, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>
> On Mon, Dec 11, 2017 at 10:22 PM, <tlw(at)monsido(dot)com> wrote:
>> The following bug has been logged on the website:
>>
>> Bug reference: 14960
>> Logged by: Tim Warberg
>> Email address: tlw(at)monsido(dot)com
>> PostgreSQL version: 10.1
>> Operating system: Ubuntu 16.04 LTS
>> Description:
>>
>> Hi,
>>
>> Ran into a issue on our PostgreSQL 10.1 cluster that seems to be a parallel
>> index scan bug. Our queue system scheduled 30 almost identical concurrent
>> queries where some of them was executed as parallel queries and all of them
>> became stuck on IPC BtreePage lock.
>
> Hi Tim,
>
> Thanks for the report. This seems to be the same as the bug that we
> just analysed over here:
>
> https://www.postgresql.org/message-id/flat/CAEepm%3D2xZUcOGP9V0O_G0%3D2P2wwXwPrkF%3DupWTCJSisUxMnuSg%40mail.gmail.com <https://www.postgresql.org/message-id/flat/CAEepm=2xZUcOGP9V0O_G0=2P2wwXwPrkF=upWTCJSisUxMnuSg(at)mail(dot)gmail(dot)com>

Checked for similar bug reports and related git commits when it happened Thursday last week but didn’t occur to me to double check today before submitting. Looks like You have all the necessary information from the other report.

>
>> We discovered this after they've been
>> stuck like that for about 10 hours [1]. At the same time a autovacuum was
>> progressing one of the queried tables and it had become stuck in LWLock
>> buffer_content while vacuuming indexes. None of the query processes
>> responded to pg_cancel_backend nor pg_terminate_backend including the
>> autovacuum.
>
> Hmm. This may be because we hold a BT_READ lock while waiting in
> _bt_parallel_seize(). Here it's extended by the above-mentioned bug,
> preventing others from acquiring an exclusive lock.
>
> --
> Thomas Munro
> http://www.enterprisedb.com <http://www.enterprisedb.com/>

Regards,

Tim Warberg

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message raghavendrajsv 2017-12-11 14:15:47 BUG #14962: missing chunk number 0 for toast value 1086251 in pg_toast_2619
Previous Message 171211 2017-12-11 09:53:11 BUG #14961: 9.6.6-4PGDG.rhel6.x86_64 introduces hanging init script