Re: the s_lock_stuck on perform_spin_delay

From: Andy Fan <zhihuifan1213(at)163(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <tmunro(at)postgresql(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Nathan Bossart <nathandbossart(at)gmail(dot)com>
Subject: Re: the s_lock_stuck on perform_spin_delay
Date: 2024-02-08 13:56:24
Message-ID: 87eddnhvf3.fsf@163.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hi,

> There are some similarities between this and
> https://www.postgresql.org/message-id/20240207184050.rkvpuudq7huijmkb%40awork3.anarazel.de
> as described in that email.

Thanks for this information.

>
>
> Hm, I think this might make this code a bit more expensive. It's cheaper, both
> in the number of instructions and their cost, to set variables on the stack
> than in global memory - and it's already performance critical code. I think
> we need to optimize the code so that we only do init_local_spin_delay() once
> we are actually spinning, rather than also on uncontended locks.

A great lession learnt, thanks for highlighting this!
>
>
>
>> @@ -5432,20 +5431,29 @@ LockBufHdr(BufferDesc *desc)
>> static uint32
>> WaitBufHdrUnlocked(BufferDesc *buf)
>> {
>> - SpinDelayStatus delayStatus;
>> uint32 buf_state;
>>
>> - init_local_spin_delay(&delayStatus);
>> + /*
>> + * Suppose the buf will not be locked for a long time, setup a spin on
>> + * this.
>> + */
>> + init_local_spin_delay();
>
> I don't know what this comment really means.

Hmm, copy-paste error. Removed in v10.

>
>
>> +#ifdef USE_ASSERT_CHECKING
>> +void
>> +VerifyNoSpinLocksHeld(bool check_in_panic)
>> +{
>> + if (!check_in_panic && spinStatus.in_panic)
>> + return;
>
> Why do we need this?

We disallow errstart when a spin lock is held then there are two
speical cases need to be handled.

a). quickdie signal handler. The reason is explained with the below
comments.

/*
* It is possible that getting here when holding a spin lock already.
* However current function needs some actions like elog which are
* disallowed when holding a spin lock by spinlock misuse detection
* system. So tell that system to treat this specially.
*/
spinStatus.in_panic = true;

b). VerifyNoSpinLocksHeld function.

if (spinStatus.func != NULL)
{
/*
* Now we have held a spin lock and then errstart is disallow,
* to avoid the endless recursive call of VerifyNoSpinLocksHeld
* because of the VerifyNoSpinLocksHeld checks in errstart,
* set spinStatus.in_panic to true to break the cycle.
*/
spinStatus.in_panic = true;
elog(PANIC, "A spin lock has been held at %s:%d",
spinStatus.func, spinStatus.line);
}

>
>
>> diff --git a/src/include/storage/s_lock.h b/src/include/storage/s_lock.h
>> index aa06e49da2..c3fe75a41d 100644
>> --- a/src/include/storage/s_lock.h
>> +++ b/src/include/storage/s_lock.h
>> @@ -652,7 +652,10 @@ tas(volatile slock_t *lock)
>> */
>> #if !defined(S_UNLOCK)
>> #define S_UNLOCK(lock) \
>> - do { __asm__ __volatile__("" : : : "memory"); *(lock) = 0; } while (0)
>> + do { __asm__ __volatile__("" : : : "memory"); \
>> + ResetSpinLockStatus(); \
>> + *(lock) = 0; \
>> +} while (0)
>> #endif
>
> That seems like the wrong place. There are other definitions of S_UNLOCK(), so
> we clearly can't do this here.

True, in the v10, the spinlock_finish_release() is called in
SpinLockRelease. The side effect is if user calls S_UNLOCK directly,
there would be something wrong because lack of call
spinlock_finish_release, I found this usage in regress.c (I changed it
to SpinLockRelease). but I think it is OK because:

1) in s_lock.h, there is clear comment say:

* NOTE: none of the macros in this file are intended to be called directly.
* Call them through the hardware-independent macros in spin.h.

2). If someone breaks the above rule, the issue can be found easily in
assert build.

3). It has no impact on release build.

>>
>> -#define init_local_spin_delay(status) init_spin_delay(status, __FILE__, __LINE__, __func__)
>> -extern void perform_spin_delay(SpinDelayStatus *status);
>> -extern void finish_spin_delay(SpinDelayStatus *status);
>> +#define init_local_spin_delay() init_spin_delay( __FILE__, __LINE__, __func__)
>> +extern void perform_spin_delay(void);
>> +extern void finish_spin_delay(void);
>
> As an API this doesn't quite make sense to me. For one, right now an
> uncontended SpinLockAcquire afaict will not trigger this mechanism, as we
> never call init_spin_delay().

Another great lesssion learnt, thanks for this as well!

>
>
> Maybe we could have

I moved on with the below suggestion with some small modification.

>
> - spinlock_prepare_acquire() - about to acquire a spinlock
> empty in optimized builds, asserts that no other spinlock is held
> etc
>
> This would get called in SpinLockAcquire(), LockBufHdr() etc.

"asserts that no other spinlock" has much more user cases than
SpinLockAcquire / LockBufHdr, I think sharing the same function will be
OK which is VerifyNoSpinLocksHeld function for now.

> - spinlock_finish_acquire() - have acquired spinlock
> empty in optimized builds, in assert builds sets variable indicating we're
> in spinlock
>
> This would get called in SpinLockRelease() etc.

I think you mean "SpinLockAcquire" here.

Matthias suggested "we need to 'arm' the checks just before we lock the
spin lock, and 'disarm' the checks just after we unlock the spin lock"
at [1], I'm kind of persuaded by that. so I used
spinlock_prepare_acquire to set variable indicating we're in
spinlock. which one do you prefer now?

>
> - spinlock_finish_release() - not holding the lock anymore
>
> This would get called by SpinLockRelease(), UnlockBufHdr()
>
>
> - spinlock_prepare_spin() - about to spin waiting for a spinlock
> like the current init_spin_delay()
>
> This would get called in s_lock(), LockBufHdr() etc.
>
>
> - spinlock_finish_spin() - completed waiting for a spinlock
> like the current finish_spin_delay()
>
> This would get called in s_lock(), LockBufHdr() etc.

All done in v10, for consistent purpose, I also renamed
perform_spin_delay to spinlock_perform_delay.

I have got much more than what I expected before in this review process,
thank you very much about that!

[1]
https://www.postgresql.org/message-id/CAEze2WggP-2Dhocmdhp-LxBzic%3DMXRgGA_tmv1G_9n-PDt2MQg%40mail.gmail.com

--
Best Regards
Andy Fan

Attachment Content-Type Size
v10-0001-Detect-misuse-of-spin-lock-automatically.patch text/x-diff 18.3 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dagfinn Ilmari Mannsåker 2024-02-08 13:58:36 Re: 2024-02-08 release announcement draft
Previous Message Ilia Evdokimov 2024-02-08 13:54:58 Re: pg_stat_advisor extension