Re: Still something fishy in the fastpath lock stuff

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Still something fishy in the fastpath lock stuff
Date: 2014-03-26 08:10:05
Message-ID: 53328B5D.2020706@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 03/26/2014 06:59 AM, Robert Haas wrote:
> On Tue, Mar 25, 2014 at 11:58 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>> I really think we
>>> should change that rule; it leads to ugly code, and bugs.
>>
>> No doubt, but are we prepared to believe that we have working "compiler
>> barriers" other than volatile? (Which, I will remind you, is in the C
>> standard. Unlike "compiler barriers".)
>
> I'm prepared to believe that it would take some time to be certain
> that we've got that right on all compilers we support. The most
> common ones are easily dealt with, I think, but it might be harder to
> verify the behavior on more obscure compilers. But I'd rather deal
> with bullet-proofing SpinLockAcquire() and SpinLockRelease() *in
> general* than deal with bullet-proofing every call site individually,
> which is what we have to do right now.

+1. To support porting to new compilers, we can fall back to e.g calling
a dummy function through a function pointer, if we don't have a proper
implementation.

Aside from the correctness issue: A while ago, while working on the
xloginsert slots stuff, I looked at the assembler that gcc generated for
ReserverXLogInsertLocation(). That function is basically:

SpinLockAcquire()
<modify and read a few variables in shared memory>
SpinLockRelease()
<fairly expensive calculations based on values read>

Gcc decided to move the release of the lock so that it became:

SpinLockAcquire()
<modify and read a few variables in shared memory>
<fairly expensive calculations based on values read>
SpinLockRelease()

I went through some effort while writing the patch to make sure the
spinlock is held for as short duration as possible. But gcc undid that
to some extent :-(.

I tried to put a compiler barrier there and run some short performance
tests, but it didn't make any noticeable difference. So it doesn't seem
matter in practice - maybe the CPU reorders things anyway, or the
calculations are not expensive enough to matter after all.

I just looked at the assembler generated for LWLockAcquire, and a
similar thing is happening there. The code is:

...
641 /* We are done updating shared state of the lock itself. */
642 SpinLockRelease(&lock->mutex);
643
644 TRACE_POSTGRESQL_LWLOCK_ACQUIRE(T_NAME(l), T_ID(l), mode);
645
646 /* Add lock to list of locks held by this backend */
647 held_lwlocks[num_held_lwlocks++] = l;
...

gcc reordered part of the "held_lwlocks" assignment after releasing the
spinlock:

.L21:
.loc 1 647 0
movslq num_held_lwlocks(%rip), %rax
addq $16, %r12
.LVL23:
.loc 1 652 0
testl %ebx, %ebx
.loc 1 642 0
movb $0, (%r14) <--- SpinLockRelease
.loc 1 652 0
leal -1(%rbx), %r13d
.loc 1 647 0
leal 1(%rax), %edx
movq %r14, held_lwlocks(,%rax,8)
.LVL24:
movl %edx, num_held_lwlocks(%rip)

The held_lwlocks assignment was deliberately put outside the
spinlock-protected section, to minimize the time the spinlock is held,
but the compiler moved some of it back in: the basic block .L21.

A compiler barrier would avoid that kind of reordering. Not using
volatile would also allow the compiler to reorder and optimize the
instructions *within* the spinlock-protected block, which might shave a
few instructions while a spinlock is held. Granted, spinlock-protected
blocks are small so there isn't much room for optimization, but still.

- Heikki

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2014-03-26 08:14:54 Re: inherit support for foreign tables
Previous Message Peter Geoghegan 2014-03-26 07:21:04 "Conditional jump or move depends on uninitialised value(s)" within tsginidx.c