Re: anole: assorted stability problems

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: anole: assorted stability problems
Date: 2015-06-29 02:07:56
Message-ID: CA+TgmoaaeRv=1120hQdTjF++Sd4G2zMA-U2-UKzJMD1vMF+CWg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jun 28, 2015 at 9:17 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> That sucks. It was easy to see that the old fallback barrier
>> implementation wasn't re-entrant, but this one should be. And now
>> that I look at it again, doesn't the failure message indicate that's
>> not the problem anyway?
>
>> ! PANIC: stuck spinlock (c00000000d6f4140) detected at lwlock.c:816
>> ! PANIC: stuck spinlock (c00000000d72f6e0) detected at lwlock.c:770
>
> I was assuming that a leaky memory barrier was allowing the spinlock
> state to become inconsistent, or at least to be perceived as inconsistent.
> But I'm not too clear on how the barrier changes you and Andres have been
> making have affected the spinlock code.

For the most part, they haven't. Andres did a bunch of work to add
atomics support, and overhauled the barrier implementation that I
committed to 9.2 along the way. But that had minimal impact on
s_lock.h.

What we did do that touched s_lock.h was attempt to ensure that
SpinLockAcquire() and SpinLockRelease() function as compiler barriers,
so that it should no longer be necessary to litter the code with
"volatile" in every function that uses those. It is possible that
this could be broken on HP-UX. If _Asm_sched_fence() doesn't
constraint the compiler appropriately, that could explain the problems
we're seeing here. But we're not the only one using that incantation,
so I'm left scratching my head.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2015-06-29 02:36:51 Adjust errorcode in background worker code
Previous Message Robert Haas 2015-06-29 01:57:23 Re: drop/truncate table sucks for large values of shared buffers