Re: Spurious standby query cancellations

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Spurious standby query cancellations
Date: 2015-12-24 02:58:40
Message-ID: CAB7nPqT7H=pep2H95b9bCyiaNQHSB6XGURNmmB0eix5QSUV+Vw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 24, 2015 at 3:33 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> On Wed, Sep 16, 2015 at 2:44 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>
>> On 14 September 2015 at 12:00, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>>
>>>>
>>>> It's now possible to fix this by putting a lock wait on the actual lock
>>>> request, which wasn't available when I first wrote that, hence the crappy
>>>> wait loop. Using the timeout handler would now be the preferred way to solve
>>>> this. We can backpatch that to 9.3 if needed, where they were introduced.
>>>>
>>>> There's an example of how to use lock waits further down on
>>>> ResolveRecoveryConflictWithBufferPin(). Could you have a look at doing it
>>>> that way?
>>>
>>>
>>> It looks like this will take some major surgery. The heavy weight lock
>>> manager doesn't play well with others when it comes to timeouts other than
>>> its own. LockBufferForCleanup is a simple retry loop, but the lock manager
>>> is much more complicated than that.
>>
>>
>> Not sure I understand this objection. I can't see a reason that my
>> proposal wouldn't work.
>
>
> On further thought, neither do I. The attached patch inverts
> ResolveRecoveryConflictWithLock to be called back from the lmgr code so that
> is it like ResolveRecoveryConflictWithBufferPin code. It does not try to
> cancel the conflicting lock holders from the signal handler, rather it just
> loops an extra time and cancels the transactions on the next call.
>
> It looks like the deadlock detection is adequately handled within normal
> lmgr code within the back-ends of the other parties to the deadlock, so I
> didn't do a timeout for deadlock detection purposes.

Patch moved to next CF because of a lack of reviews. Simon is
registered as reviewer, hence I guess that the ball is on his side of
the field.
--
Michael

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-12-24 03:00:11 Re: Support for N synchronous standby servers - take 2
Previous Message Michael Paquier 2015-12-24 02:56:37 Re: Move PinBuffer and UnpinBuffer to atomics