Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Peter Geoghegan <pg(at)bowt(dot)ie>, chenhj <chjischj(at)163(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock
Date: 2019-03-21 18:25:53
Message-ID: CANP8+j+K4whxf7ET7+gO+G-baC3-WxqqH=nV4X2CgfEPA3Yu3g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 13 Dec 2018 at 14:48, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
wrote:

> On Thu, Dec 13, 2018 at 10:46 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > On 2018-12-13 22:40:59 +0300, Alexander Korotkov wrote:
> > > It doesn't mater, because we release all locks on every buffer at one
> > > time. The unlock order can have effect on what waiter will acquire
> > > the lock next after ginRedoDeletePage(). However, I don't see why one
> > > unlock order is better than another. Thus, I just used the rule of
> > > thumb to not change code when it's not necessary for bug fix.
> >
> > I think it's right to not change unlock order at the same time as a
> > bugfix here. More generally I think it can often be useful to default
> > to release locks in the inverse order they've been acquired - if there's
> > any likelihood that somebody will acquire them in the same order, that
> > ensures that such a party would only need to wait for a lock once,
> > instead of being woken up for one lock, and then immediately having to
> > wait for the next one.
>
> Good point, thank you!
>

It's been pointed out to me that 52ac6cd2d0cd70e01291e0ac4ee6d068b69bc478
introduced a WAL incompatibility that has not been flagged.

In ginRedoDeletePage() we use the struct directly to read the WAL record,
so if a WAL record was written prior to
52ac6cd2d0cd70e01291e0ac4ee6d068b69bc478, yet read by code at
52ac6cd2d0cd70e01291e0ac4ee6d068b69bc478 or later then we will have
problems, since deleteXid will not be set correctly.

It seems this should not have been backpatched.

Please give your assessment.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2019-03-21 18:45:15 Re: Best way to keep track of a sliced TOAST
Previous Message Shawn Debnath 2019-03-21 18:20:50 Re: Introduce timeout capability for ConditionVariableSleep