Re: the s_lock_stuck on perform_spin_delay

From: Andy Fan <zhihuifan1213(at)163(dot)com>
To: Andy Fan <zhihuifan1213(at)163(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <tmunro(at)postgresql(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: the s_lock_stuck on perform_spin_delay
Date: 2024-01-21 23:26:04
Message-ID: 87wms2w9hn.fsf@163.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hi,

Andy Fan <zhihuifan1213(at)163(dot)com> writes:

> Hi,
>
> v6 attached which addressed all the items Robert suggested except the
> following 2 open items. They are handled differently.
>
>>
>> Here is the summary of the open-items, it would be great that Andres and
>> Matthias have a look at this when they have time.
>>
>>>> The LockBufHdr also used init_local_spin_delay / perform_spin_delay
>>>> infrastruce and then it has the same issue like ${subject}, it is pretty
>>>> like the code in s_lock; Based on my current knowledge, I think we
>>>> should add the check there.
>>>
>>> I'd like to hear from Andres, if possible. @Andres: Should these
>>> sanity checks apply only to spin locks per se, or also to buffer
>>> header locks?
>
> v6 is splitted into 2 commits, one for normal SpinLock and one for
> LockBufHdr lock.
>
> commit 6276d2f66b0760053e3fdfe259971be3abba3c63
> Author: yizhi.fzh <yizhi(dot)fzh(at)alibaba-inc(dot)com>
> Date: Fri Jan 19 13:52:07 2024 +0800
>
> Detect more misuse of spin lock automatically
>
> Spin lock are intended for *very* short-term locks, but it is possible
> to be misused in many cases. e.g. Acquiring another LWLocks or regular
> locks, memory allocation, errstart when holding a spin lock. this patch
> would detect such misuse automatically in a USE_ASSERT_CHECKING build.
>
> CHECK_FOR_INTERRUPTS should be avoided as well when holding a spin lock.
> Depends on what signals are left to handle, PG may raise error/fatal
> which would cause the code jump to some other places which is hardly to
> release the spin lock anyway.
>
> commit 590a0c6f767f62f6c83289d55de99973bc7da417 (HEAD -> s_stuck_v3)
> Author: yizhi.fzh <yizhi(dot)fzh(at)alibaba-inc(dot)com>
> Date: Fri Jan 19 13:57:46 2024 +0800
>
> Treat (un)LockBufHdr as a SpinLock.
>
> The LockBufHdr also used init_local_spin_delay / perform_spin_delay
> infrastructure and so it is also possible that PANIC the system
> when it can't be acquired in a short time, and its code is pretty
> similar with s_lock. so treat it same as SPIN lock when regarding to
> misuse of spinlock detection.
>
>>>> they were put into spin.h in v1 but later move to miscadmin.h at [1].
>>>> [1]
>>>> https://www.postgresql.org/message-id/CAEze2WggP-2Dhocmdhp-LxBzic%3DMXRgGA_tmv1G_9n-PDt2MQg%40mail.gmail.com
>>>
>>> I'm not entirely sure what the right thing to do is here, and the
>>> answer may depend on the previous question. But I disagree with
>>> Matthias -- I don't think miscadmin.h can be the right answer
>>> regardless.
>
> I put it into spin.h this time in commit 1, and include the extern
> function VerifyNoSpinLocksHeld in spin.c into miscadmin.h like what we
> did for ProcessInterrupts. This will easy the miscadmin dependency. the
> changes for '#include xxx' looks better than before.

I found a speical case about checking it in errstart. So commit 3 in v7
is added. I'm not intent to commit 1 and commit 3 should be 2 sperate
commits, but making them 2 will be easy for discussion.

commit 757c67c1d4895ce6a523bcf5217af8eb2351e2a1 (HEAD -> s_stuck_v3)
Author: yizhi.fzh <yizhi(dot)fzh(at)alibaba-inc(dot)com>
Date: Mon Jan 22 07:14:29 2024 +0800

Bypass SpinLock checking in SIGQUIT signal hander

When a process receives a SIGQUIT signal, it indicates the system has a
crash time. It's possible that the process is just holding a Spin
lock. By our current checking, this process will PANIC with a misuse of
spinlock which is pretty prone to misunderstanding. so we need to bypass
the spin lock holding checking in this case. It is safe since the
overall system will be restarted.

--
Best Regards
Andy Fan

Attachment Content-Type Size
v7-0001-Detect-more-misuse-of-spin-lock-automatically.patch text/x-diff 7.9 KB
v7-0002-Treat-un-LockBufHdr-as-a-SpinLock.patch text/x-diff 1.4 KB
v7-0003-Bypass-SpinLock-checking-in-SIGQUIT-signal-hander.patch text/x-diff 1.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2024-01-21 23:30:22 Re: Evaluate arguments of correlated SubPlans in the referencing ExprState
Previous Message Nathan Bossart 2024-01-21 22:13:20 Re: introduce dynamic shared memory registry