Re: Re: [PATCH] Use CC atomic builtins if available [was: Re: TAS patch for building on armel/armhf thumb]

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: Re: [PATCH] Use CC atomic builtins if available [was: Re: TAS patch for building on armel/armhf thumb]
Date: 2011-12-19 21:25:06
Message-ID: 4EEFABB2.5050100@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 19.12.2011 22:12, Tom Lane wrote:
> Noah Misch<noah(at)leadboat(dot)com> writes:
>> On Mon, Dec 19, 2011 at 05:09:11PM +0200, Heikki Linnakangas wrote:
>>> That is not sufficient on platforms with a weak memory model, like Itanium.
>
>> Other processors may observe the lock as held after its release, but there's no
>> correctness problem.
>
> How weak is the memory model, exactly?
>
> A correctness problem would ensue if out-of-order stores are possible,
> ie other processors could observe the lock being freed (and then acquire
> it) before seeing changes to shared variables that had been made while
> holding the lock.
>
> I'm a little dubious that this applies to Itanium, because I don't see
> any memory fence instruction in the TAS macro. If we needed one in
> S_UNLOCK I'd rather expect there to be one in TAS as well.

There's a pretty good manual called "Implementing Spinlocks on Itanium
and PA-RISC" from HP at:
http://h21007.www2.hp.com/portal/download/files/unprot/itanium/spinlocks.pdf.
It says, in section "3.2.1 Try to get a spinlock":

> Note also, that the ‘xchg’ instruction always
> has the ‘acquire’ semantics, so it enforces the correct memory ordering.

But the current implementation seems to be safe, anyway. In section
"3.2.3 Freeing a spinlock", that manual says:

> Note, that a simple assignment statement into a volatile variable
will not work, as we are
> assuming that the +Ovolatile=__unordered compile option is being used.

So +Ovolatile=__unordered would allow the kind of optimization that I
thought was possible, and we would have a problem if we used it. But the
default is more conservative, and a simple assignment does work.

I compiled the attached test program on an HP Itanium box, using the
same flags you get from PostgreSQL's configure on that box. The relevant
assembler output is:

xchg4 r14 = [r15], r14 // M [slocktest.c: 66/3]
//file/line/col slocktest.c/67/3
ld1.acq r16 = [r11] // M [slocktest.c: 67/3]
nop.i 0 // I
//file/line/col slocktest.c/68/3
st1.rel [r11] = r10 ;; // M [slocktest.c: 68/3]
//file/line/col slocktest.c/69/3
st4.rel [r15] = r0 // M [slocktest.c: 69/3]
//file/line/col slocktest.c/70/1

The trick I missed is that the compiler attaches .rel to all the stores
and .acq to all the loads through a volatile pointer. gcc seems to do
the same. So we're safe.

It would be interesting to test whether using +Ovolatile=__unordered
would make a difference to performance. Tacking those memory fence
modifiers to all the volatile loads and stores might be expensive.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
slocktest.c text/x-csrc 1.4 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Noah Misch 2011-12-19 21:53:41 Re: Re: [PATCH] Use CC atomic builtins if available [was: Re: TAS patch for building on armel/armhf thumb]
Previous Message Heikki Linnakangas 2011-12-19 20:21:22 Re: [PATCH] Use CC atomic builtins if available [was: Re: TAS patch for building on armel/armhf thumb]