Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>, chenhj <chjischj(at)163(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock
Date: 2018-12-13 15:32:32
Message-ID: 20181213153232.GA10664@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 13, 2018 at 03:03:54PM +0300, Alexander Korotkov wrote:
> On Wed, Dec 12, 2018 at 4:08 PM Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
> > > 12 дек. 2018 г., в 3:26, Alexander Korotkov <aekorotkov(at)gmail(dot)com> написал(а):
> > >
> > > BTW, I still can't see you're setting deleteXid field of
> > > ginxlogDeletePage struct anywhere.
> > Oops. Fixed.
> > >
> > > Also, I note that protection against re-usage of recently deleted
> > > pages is implemented differently than it is in B-tree.
> > > 1) You check TransactionIdPrecedes(GinPageGetDeleteXid(page),
> > > RecentGlobalDataXmin)), while B-tree checks
> > > TransactionIdPrecedes(opaque->btpo.xact, RecentGlobalXmin). Could you
> > > clarify why do we use RecentGlobalDataXmin instead of RecentGlobalXmin
> > > here?
> > As far as I understand RecentGlobalDataXmin may be slightly farther than RecentGlobalXmin in case when replication_slot_catalog_xmin is holding RecentGlobalXmin. And GIN is never used by catalog tables. But let's use RecentGlobalXmin like in B-tree.
> >
> > > 2) B-tree checks this condition both before putting page into FSM and
> > > after getting page from FSM. You're checking only after getting page
> > > from FSM. Approach of B-tree looks better for me. It's seems more
> > > consistent when FSM pages are really free for usage.
> > Fixed.
>
> Thank you. I've revised your patch and pushed it. As long as two
> other patches in this thread.

I am seeing this warning in the 9.4 branch:

ginxlog.c:756:5: warning: ‘lbuffer’ may be used uninitialized
in this function [-Wmaybe-uninitialized]

from this commit:

commit 19cf52e6cc33b9e126775f28269ccccb6ddf7e30
Author: Alexander Korotkov <akorotkov(at)postgresql(dot)org>
Date: Thu Dec 13 06:12:25 2018 +0300

Prevent deadlock in ginRedoDeletePage()

On standby ginRedoDeletePage() can work concurrently with read-only queries.
Those queries can traverse posting tree in two ways.
1) Using rightlinks by ginStepRight(), which locks the next page before
unlocking its left sibling.
2) Using downlinks by ginFindLeafPage(), which locks at most one page at time.

Original lock order was: page, parent, left sibling. That lock order can
deadlock with ginStepRight(). In order to prevent deadlock this commit changes
lock order to: left sibling, page, parent. Note, that position of parent in
locking order seems insignificant, because we only lock one page at time while
traversing downlinks.

Reported-by: Chen Huajun
Diagnosed-by: Chen Huajun, Peter Geoghegan, Andrey Borodin
Discussion: https://postgr.es/m/31a702a.14dd.166c1366ac1.Coremail.chjischj%40163.com
Author: Alexander Korotkov
Backpatch-through: 9.4

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Verite 2018-12-13 16:26:48 Re: BUG #15548: Unaccent does not remove combining diacritical characters
Previous Message chjischj@163.com 2018-12-13 15:12:04 Re:Connections hang indefinitely while taking a gin index's LWLock buffer_content lock