Re: the s_lock_stuck on perform_spin_delay

From: Andy Fan <zhihuifan1213(at)163(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <tmunro(at)postgresql(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: the s_lock_stuck on perform_spin_delay
Date: 2024-01-25 07:24:17
Message-ID: 87msstvper.fsf@163.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hi,

> Hi,
>
> On 2024-01-22 15:18:35 +0800, Andy Fan wrote:
>> I used sigismember(&BlockSig, SIGQUIT) to detect if a process is doing a
>> quickdie, however this is bad not only because it doesn't work on
>> Windows, but also it has too poor performance even it impacts on
>> USE_ASSERT_CHECKING build only. In v8, I introduced a new global
>> variable quickDieInProgress to handle this.
>
> For reasons you already noted, using sigismember() isn't viable. But I am not
> convinced by quickDieInProgress either. I think we could just reset the state
> for the spinlock check in the code handling PANIC, perhaps via a helper
> function in spin.c.

Handled with the action for your suggestion #3.

>> +void
>> +VerifyNoSpinLocksHeld(void)
>> +{
>> +#ifdef USE_ASSERT_CHECKING
>> + if (last_spin_lock_file != NULL)
>> + elog(PANIC, "A spin lock has been held at %s:%d",
>> + last_spin_lock_file, last_spin_lock_lineno);
>> +#endif
>> +}
>
> I think the #ifdef for this needs to be in the header, not here. Otherwise we
> add a pointless external function call to a bunch of performance critical
> code.
>
Done.

>> diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
>> index 7d601bef6d..c600a113cf 100644
>> --- a/src/backend/storage/buffer/bufmgr.c
>> +++ b/src/backend/storage/buffer/bufmgr.c
>> @@ -5409,6 +5409,7 @@ LockBufHdr(BufferDesc *desc)
>>
>> init_local_spin_delay(&delayStatus);
>>
>> + START_SPIN_LOCK();
>> while (true)
>> {
>> /* set BM_LOCKED flag */
>
> Seems pretty odd that we now need init_local_spin_delay() and
> START_SPIN_LOCK(). Note that init_local_spin_delay() also wraps handling of
> __FILE__, __LINE__ etc, so it seems we're duplicating state here.

Thanks for catching this! Based on the feedbacks so far, it is not OK to
acquire another spin lock when holding one already. So I refactor the
code like this:

/*
- * Support for spin delay which is useful in various places where
- * spinlock-like procedures take place.
+ * Support for spin delay and spin misuse detection purpose.
+ *
+ * spin delay which is useful in various places where spinlock-like
+ * procedures take place.
+ *
+ * spin misuse is based on global spinStatus to know if a spin lock
+ * is held when a heavy operation is taking.
*/
typedef struct
{
@@ -846,22 +854,40 @@ typedef struct
const char *file;
int line;
const char *func;
-} SpinDelayStatus;
+ bool in_panic; /* works for spin lock misuse purpose. */
+} SpinLockStatus;

+extern PGDLLIMPORT SpinLockStatus spinStatus;

Now all the init_local_spin_delay, perform_spin_delay, finish_spin_delay
access the same global variable spinStatus. and just 2 extra functions
added (previously have 3). they are:

extern void VerifyNoSpinLocksHeld(bool check_in_panic);
extern void ResetSpinLockStatus(void);

The panic check stuff is still added into spinLockStatus.

v9 attached.

--
Best Regards
Andy Fan

Attachment Content-Type Size
v9-0001-Detect-misuse-of-spin-lock-automatically.patch text/x-diff 16.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andy Fan 2024-01-25 07:36:41 Re: the s_lock_stuck on perform_spin_delay
Previous Message Richard Guo 2024-01-25 07:21:26 Re: Trivial revise for the check of parameterized partial paths