Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: chjischj(at)163(dot)com
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock
Date: 2018-11-07 20:52:42
Message-ID: CAH2-Wzm5yW82vo9hXUBiVSFr-b+FEW1QLwyWctVHNZnFKLxGZQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 6, 2018 at 10:05 AM chenhj <chjischj(at)163(dot)com> wrote:
> I analyzed the btree block where lwlock deadlock occurred, as follows:

Thank you for doing this important work.

You're using Postgres 10.2. While that version isn't current with all
GIN bug fixes, it does have this important one:

"Ensure that vacuum will always clean up the pending-insertions list
of a GIN index (Masahiko Sawada)"

Later GIN fixes seem unlikely to be relevant to your issue. I think
that this is probably a genuine, new bug.

> The ginInsertValue() function above gets the lwlock in the order described in the README.

> However, ginScanToDelete() depth-first scans the btree and gets the EXCLUSIVE lock, which creates a deadlock.
> Is the above statement correct? If so, deadlocks should easily happen.

I have been suspicious of deadlock hazards in the code for some time
-- particularly around pending list cleanup. I go into a lot of detail
on my suspicions here:

https://www.postgresql.org/message-id/flat/CAH2-WzmfUpRjWcUq3%2B9ijyum4AJ%2Bk-meGT8_HnxBMpKz1aNS-g%40mail.gmail.com#ea5af1088adfacb3d0ba88313be1a5e3

I note that your first deadlock involve the following kinds of backends:

* ginInsertCleanup() calls from a regular backend, which will have a
backend do things that VACUUM generally only gets to do, like call
RecordFreeIndexPage().

* (auto)VACUUM processes.

Your second/recovery deadlock involves:

* Regular read-only queries.

* Recovery code.

Quite a lot of stuff is involved here!

The code in this area is way too complicated, and I haven't thought
about it in about a year, so it's hard for me to be sure of anything
at the moment. My guess is that there is confusion about the type of
page expected within one or more blocks (e.g. entry tree vs. pending
list), due to a race condition in block deletion and/or recycling --
again, I've suspected something like this could happen for some time.
The fact that you get a distinct deadlock during recovery is
consistent with that theory.

It's safe to say that promoting the asserts on gin page type into
"can't happen" elog errors in places like scanPendingInsert() and
ginInsertCleanup() would be a good start. Note that we already did
similar Assert-elog(ERROR) promotion this for posting tree code, when
similar bugs appeared way back in 2013.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2018-11-07 21:02:56 Re: PostgreSQL Limits and lack of documentation about them.
Previous Message Robert Haas 2018-11-07 20:00:45 Re: partitioned indexes and tablespaces