Re: Atomic operations within spinlocks

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Atomic operations within spinlocks
Date: 2020-06-09 06:08:47
Message-ID: 20200609060847.vd2c67ih4bz6vyww@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2020-06-05 17:19:26 -0700, Andres Freund wrote:
> Hi,
>
> On 2020-06-04 19:33:02 -0700, Andres Freund wrote:
> > But it looks like that code is currently buggy (and looks like it always
> > has been), because we don't look at the nested argument when
> > initializing the semaphore. So we currently allocate too many
> > semaphores, without benefiting from them :(.
>
> I wrote a patch for this, and when I got around to to testing it, I
> found that our tests currently don't pass when using both
> --disable-spinlocks and --disable-atomics. Turns out to not be related
> to the issue above, but the global barrier support added in 13.
>
> That *reads* two 64 bit atomics in a signal handler. Which is normally
> fine, but not at all cool when atomics (or just 64 bit atomics) are
> backed by spinlocks. Because we can "self interrupt" while already
> holding the spinlock.
>
> It looks to me that that's a danger whenever 64bit atomics are backed by
> spinlocks, not just when both --disable-spinlocks and --disable-atomics
> are used. But I suspect that it's really hard to hit the tiny window of
> danger when those options aren't used. While we have buildfarm animals
> testing each of those separately, we don't have one that tests both
> together...
>
> I'm not really sure what to do about that issue. The easisest thing
> would probably be to change the barrier generation to 32bit (which
> doesn't have to use locks for reads in any situation). I tested doing
> that, and it fixes the hangs for me.
>
>
> Randomly noticed while looking at the code:
> uint64 flagbit = UINT64CONST(1) << (uint64) type;
>
> that shouldn't be 64bit, right?

Attached is a series of patches addressing these issues, of varying
quality:

1) This fixes the above mentioned issue in the global barrier code by
using 32bit atomics. That might be fine, or it might not. I just
included it here because otherwise the tests cannot be run fully.

2) Fixes spinlock emulation when more than INT_MAX spinlocks are
initialized in the lifetime of a single backend

3) Add spinlock tests to normal regression tests.
- Currently as part of test_atomic_ops. Probably not worth having a
separate SQL function?
- Currently contains a test for 1) that's run when the spinlock
emulation is used. Probably too slow to actually indclude? Takes 15s
on my computer... OTOH, it's just with --disable-spinlocks...
- Could probably remove the current spinlock tests after this. The
only thing they additionally test is a stuck spinlock. Since
they're not run currently, they don't seem worth much?

4) Fix the potential for deadlocks when using atomics while holding a
spinlock, add tests for that.

Any comments?

Greetings,

Andres Freund

Attachment Content-Type Size
v1-0001-WORKAROUND-Avoid-spinlock-use-in-signal-handler-b.patch text/x-diff 7.2 KB
v1-0002-spinlock-emulation-Fix-bug-when-more-than-INT_MAX.patch text/x-diff 1.0 KB
v1-0003-Add-basic-spinlock-tests-to-regression-tests.patch text/x-diff 3.9 KB
v1-0004-Fix-deadlock-danger-when-atomic-ops-are-done-unde.patch text/x-diff 7.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2020-06-09 06:09:21 Re: BUG #16481: Stored Procedure Triggered by Logical Replication is Unable to use Notification Events
Previous Message Tom Lane 2020-06-09 06:03:09 Re: Add -Wold-style-definition to CFLAGS?