Re: Still something fishy in the fastpath lock stuff

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Still something fishy in the fastpath lock stuff
Date: 2014-03-26 04:59:51
Message-ID: CA+Tgmobdj1yAYxhuPP3Uwsot1qSTc1dXQWAhvqxY-J2WiFk2+A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 25, 2014 at 11:58 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> Well, it's possible that the issue could be related to compiler
>> reordering, since it's still the rule that SpinLockAcquire/Release
>> must be memory barriers but need not be compiler barriers, and
>> FastPathStrongRelationLocks is not volatile-ized.
>
> Yeah, I was just about to complain about that. What made you think
> you could get away with ignoring that coding rule?

Um, nothing. It was just an oversight. If the implication here is
that I deliberately ignore PostgreSQL coding rules, then I resent
that. If the implication is that I don't understand them, the fact
that I made explicit reference to the existence of this precise rule
in my previous response would seem to militate against that
interpretation.

>> I really think we
>> should change that rule; it leads to ugly code, and bugs.
>
> No doubt, but are we prepared to believe that we have working "compiler
> barriers" other than volatile? (Which, I will remind you, is in the C
> standard. Unlike "compiler barriers".)

I'm prepared to believe that it would take some time to be certain
that we've got that right on all compilers we support. The most
common ones are easily dealt with, I think, but it might be harder to
verify the behavior on more obscure compilers. But I'd rather deal
with bullet-proofing SpinLockAcquire() and SpinLockRelease() *in
general* than deal with bullet-proofing every call site individually,
which is what we have to do right now.

Keep in mind that we don't require the use of volatile for critical
sections involving LWLockAcquire() and LWLockRelease(), so we're
implicitly assume that those functions DO act as compiler barriers. I
think we're mostly just hoping that the compiler isn't smart enough to
make deductions about the behavior of a function in another
translation unit, or that if it is the combination of
SpinLockAcquire() + SpinLockRelease() adds up to a compiler barrier
even if they aren't each individually a barrier. Whatever the reason,
our coding rule is currently that you've got to volatile-ize all
references within sections protected by a spinlock, but not sections
protected by an lwlock. Without intending to defend the fact that I
failed to do it correctly in this case, I don't find it surprising
that people flub that occasionally: this is not the first time someone
has made such an error and it doubtless won't be the last. One of the
things I did to Andres's replication slot patch when reviewing it was
fix numerous errors of this type. I don't think he'll be the last
person to get it wrong, either.

> I looked at the asm code around line 1240, and it seems all right, but
> if this is a locking problem then the actual failure was probably in some
> preceding spinlocked segment. I haven't the patience to look at each one
> of these segments, and it's sort of irrelevant anyway, because whether
> this particular machine did reorder the asm code or not, the fact remains
> that *this C code is broken*. The fact that you don't like the existing
> coding rules concerning spinlocks is not license to write unsafe code.

Tom, I never said otherwise. Flaming me is neither necessary nor
productive. It seems to me that the simplest thing to do might be
just this:

-static FastPathStrongRelationLockData *FastPathStrongRelationLocks;
+static volatile FastPathStrongRelationLockData *FastPathStrongRelationLocks;

Do you see any problem with that approach?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2014-03-26 05:07:08 Re: Why MarkBufferDirtyHint doesn't increment shared_blks_dirtied
Previous Message Amit Kapila 2014-03-26 03:31:44 Re: [bug fix] PostgreSQL fails to start on Windows if it crashes after tablespace creation