Re: Spurious standby query cancellations

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Spurious standby query cancellations
Date: 2015-09-14 17:00:30
Message-ID: CAMkU=1w4nWhznTgUr869v56r=nC-mE4MHgRTyZM+P1hA_WNs6g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 4, 2015 at 3:25 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

> On 27 August 2015 at 22:55, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>
>> In ResolveRecoveryConflictWithLock, there is the comment:
>>
>> /*
>> * If blowing away everybody with conflicting locks doesn't work,
>> after
>> * the first two attempts then we just start blowing everybody away
>> until
>> * it does work.
>>
>>
>> But what it does is something different than that.
>>
>> At least under some conditions, it will wait patiently for all initially
>> conflicting locks to go away. If that doesn't work twice (because new
>> lockers joined while we were waiting for the old ones to go away), then it
>> will wait patiently for all transactions throughout the system to go away
>> even if they don't conflict, perhaps not even realizing that the lock has
>> become free in the mean time.
>>
>> Then when its patience runs out, it kills everything on the system. But
>> it never tried to blow away just the conflicting locks, instead it just
>> tried to wait them out.
>>
>> The fact that trying to wait them out didn't work (twice) doesn't mean
>> that killing them wouldn't have worked.
>>
>> I think that it was intended that num_attempts would get incremented only
>> once WaitExceedsMaxStandbyDelay becomes true, but that is not what happens
>> with the current code.
>>
>> Currently WaitExceedsMaxStandbyDelay only has one caller. I think it we
>> should take the sleep code out of that function and move it into the
>> existing call site, and then have ResolveRecoveryConflictWithLock check
>> with WaitExceedsMaxStandbyDelay before incrementing num_attempts.
>>
>
> I agree with the problem in the current code, thank you for spotting it.
>
> I also agree that the proposed solution would work-ish, if we are
> suggesting backpatching this.
>

I wasn't specifically thinking of backpatching it, but if it doesn't
operate the way you intended it to, it might make sense to do that.

I stumbled on this while experimentally looking into a different issue (the
ProcArrayLock contention issue in HeapTupleSatisfiesMVCC that was recently
ameliorated in 8a7d0701814, but the variant of that issue reported
in "[PERFORM] Strange query stalls on replica in 9.3.9"). I haven't heard
of any complaints from the field about too-frequent query cancellations,
but I also don't have my "ear to the ground" in that area.

>
> It's now possible to fix this by putting a lock wait on the actual lock
> request, which wasn't available when I first wrote that, hence the crappy
> wait loop. Using the timeout handler would now be the preferred way to
> solve this. We can backpatch that to 9.3 if needed, where they were
> introduced.
>
> There's an example of how to use lock waits further down
> on ResolveRecoveryConflictWithBufferPin(). Could you have a look at doing
> it that way?
>

It looks like this will take some major surgery. The heavy weight lock
manager doesn't play well with others when it comes to timeouts other than
its own. LockBufferForCleanup is a simple retry loop, but the lock manager
is much more complicated than that.

Here are some approaches I can think of:

Approach one, replace LockAcquireExtended's dontWait boolean with a
"waitUntil" timeout value. Queue the lock like ordinary
(non-recovery-backend) lock requests are queued so that new requestors are
not granted new locks which conflict with what we want. If the timeout
expires without the lock being acquired, dequeue ourselves and clean up,
and then return LOCKACQUIRE_NOT_AVAIL. I think something would have to be
done about deadlocks, as well, as I don't think the regular deadlock
detectors works in the standby startup process.

Approach two, same as one but when the timeout expires we invoke a callback
to terminate the blocking processes without dequeueing ourselves. That way
no one can sneak in and grab the lock during the time we were doing the
terminations. This way we are guaranteed to succeed after one round of
terminations, and there is no need to terminate innocent bystanders. I
think this is my preferred solution to have. (But I haven't had much luck
implementing it beyond the hand-wavy stages.)

Approach three, ask for the lock with dontWait as we do now, but if we
don't get the lock then somehow arrange for us to be signalled when the
lock becomes free, without being queued for it. Then we would have to
retry, with no guarantee the retry would be successful. Functionally this
would be no different than the simple patch I gave, except the polling loop
is much more efficient. You still have to terminate processes if a stream
of new requestors take the lock in a constantly overlapping basis.

> We probably also need to resurrect my earlier patch to avoid logging
> AccessExclusiveLocks on temp tables.
>

I couldn't find that, can you provide a link?

Another source of frequent AccessExclusiveLocks is vacuum truncation
attempts. If a table has some occupied pages right at the end which are
marked as all-visible, then forward scan doesn't visit them. Since it
didn't visit them, it assumes they might be empty and truncatable and so
takes the AccessExclusiveLock, only to immediately see that the last page
is not empty. Perhaps the forward scan could be special-cased to never
skip the final block of a table. That way if it is not empty, the
truncation is abandoned before taking the AccessExclusiveLock.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2015-09-14 17:03:31 Re: WIP: Make timestamptz_out less slow.
Previous Message Shulgin, Oleksandr 2015-09-14 16:46:51 Re: On-demand running query plans using auto_explain and signals