Re: An example of bugs for Hot Standby

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Hiroyuki Yamada <yamada(at)kokolink(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: An example of bugs for Hot Standby
Date: 2009-12-16 14:01:51
Message-ID: 1260972111.634.1440.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 2009-12-16 at 10:33 +0000, Simon Riggs wrote:
> On Tue, 2009-12-15 at 20:25 +0900, Hiroyuki Yamada wrote:
> > Hot Standby node can freeze when startup process calls LockBufferForCleanup().
> > This bug can be reproduced by the following procedure.
>
> Interesting. Looks like this can happen, which is a shame cos I just
> removed the wait checking code after not ever having seen a wait.
>
> Thanks for the report.
>
> Must-fix item for HS.

So this deadlock can happen at two places:

1. When a relation lock waits behind an AccessExclusiveLock and then
Startup runs LockBufferForCleanup()

2. When Startup is a pin count waiter and a lock acquire begins to wait
on a relation lock

So we must put in direct deadlock detection in both places. We can't use
the normal deadlock detector because in case (1) the backend might
already have exceeded deadlock_timeout.

Proposal:

Make Startup wait on a well-known semaphore rather than on its
proc->sem. This means we can skip the spinlock check in
ProcSendSignal().

For (1) if Startup runs LockBufferForCleanup and can't get cleanup lock
then it marks itself waiting. It then checks for any lock waiters. If
there are >0 lock waiters then it waits for up to max_standby_delay and
then aborts all current lock waiters, none of whom would ever wake if we
continue waiting.

For (2) If a normal backend goes into a lock wait in HS then it will
check to see if Startup is waiting, if so, throw ERROR. This can happen
immediately because if Startup is already waiting then to wait for the
lock would cause deadlock.

--
Simon Riggs www.2ndQuadrant.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marko Kreen 2009-12-16 14:23:51 Re: Patch: Remove gcc dependency in definition of inline functions
Previous Message tomas 2009-12-16 13:29:59 Re: Range types