On Thu, Dec 28, 2000 at 03:54:50PM -0500, Tom Lane wrote:
> I have been digging into the observed failure
> FATAL: Checkpoint lock is busy while data base is shutting down
> on some Alpha machines. It apparently doesn't happen on all Alphas,
> but it's quite reproducible on some of them.
> The bottom line turns out to be that on the Alpha hardware, it is
> possible for TAS() to fail even when the lock is initially zero,
> because that hardware's locking protocol will fail to acquire the
> lock if the ldq_l/stq_c sequence is interrupted. TAS() *must* be
> called in a retry loop on Alphas.
As I understand it, this is normal semantics for the load-locked/
store-conditional primitive. The idea is that the normal case is fast,
but anything that might interfere with absolute correctness causes it
to fail and need to be retried (generally just once).
> It also bothers me that xlog.c contains several places where there is a
> potentially infinite wait for a lock. It seems to me that these should
> time out with stuck-spinlock messages. Do you object to such a change?
A spinlock held for more than a few cycles indicates a bug.
I wonder about the advisability of using spinlocks in user-level code
which might be swapped out any time. Normally, spinlocks are taken in
kernel code with great care about interrupts and context switches while
the lock is held. I don't know how one could take the necessary
precautions at user level.
In response to
pgsql-hackers by date
|Next:||From: Mikheev, Vadim||Date: 2000-12-28 21:56:39|
|Subject: RE: Assuming that TAS() will succeed the first time is verboten|
|Previous:||From: Tom Lane||Date: 2000-12-28 21:33:56|
|Subject: Re: Alpha tas() patch |