Re: the s_lock_stuck on perform_spin_delay

From: Andy Fan <zhihuifan1213(at)163(dot)com>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <tmunro(at)postgresql(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: the s_lock_stuck on perform_spin_delay
Date: 2024-01-11 01:55:18
Message-ID: 87h6jkmucz.fsf@163.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hi Matthias,

Thanks for the review!

Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> writes:

> On Wed, 10 Jan 2024 at 02:44, Andy Fan <zhihuifan1213(at)163(dot)com> wrote:
>> Hi,
>>
>> I want to know if Andres or you have plan
>> to do some code review. I don't expect this would happen very soon, just
>> want to make sure this will not happen that both of you think the other
>> one will do, but actually none of them does it in fact. a commit fest
>> [1] has been added for this.
>
>
>> +++ b/src/backend/storage/buffer/bufmgr.c
>> @@ -5419,6 +5419,7 @@ LockBufHdr(BufferDesc *desc)
>> perform_spin_delay(&delayStatus);
>> }
>> finish_spin_delay(&delayStatus);
>> + START_SPIN_LOCK();
>> return old_buf_state | BM_LOCKED;
>> }
>
> I think that we need to 'arm' the checks just before we lock the spin
> lock, and 'disarm' the checks just after we unlock the spin lock,
> rather than after and before, respectively. That way, we won't have a
> chance of false negatives: with your current patch it is possible that
> an interrupt fires between the acquisition of the lock and the code in
> START_SPIN_LOCK() marking the thread as holding a spin lock, which
> would cause any check in that signal handler to incorrectly read that
> we don't hold any spin locks.

That's a good idea. fixed in v2.

>
>> +++ b/src/backend/storage/lmgr/lock.c
>> @@ -776,6 +776,8 @@ LockAcquireExtended(const LOCKTAG *locktag,
>> bool found_conflict;
>> bool log_lock = false;
>>
>> + Assert(SpinLockCount == 0);
>> +
>
> I'm not 100% sure on the policy of this, but theoretically you could
> use LockAquireExtended(dontWait=true) while holding a spin lock, as
> that would not have an unknown duration. Then again, this function
> also does elog/ereport, which would cause issues, still, so this code
> may be the better option.

I thought this statement as "keeping the current patch as it is" since
"not waiting" doesn't means the a few dozen in this case. please
correct me if anything wrong.

>
>> + elog(PANIC, "stuck spinlock detected at %s, %s:%d after waiting for %u ms",
>> + func, file, line, delay_ms);
>
> pg_usleep doesn't actually guarantee that we'll wait for exactly that
> duration; depending on signals received while spinning and/or OS
> scheduling decisions it may be off by orders of magnitude.

True, but I did this for two reasons. a). the other soluation needs call
'time' syscall twice, I didn't want to pay this run-time effort. b). the
possiblity of geting a signals during pg_usleep should be low and
even that happens, because the message is just for human, we don't need
a absolutely accurate number. what do you think?

>
>> +++ b/src/common/scram-common.c
>
> This is unrelated to the main patchset

Fixed in v2.

>
>> +++ b/src/include/storage/spin.h
>
> Minor: I think these changes could better be included in miscadmin, or
> at least the definition for SpinLockCount should be moved there: The
> spin lock system itself shouldn't be needed in places where we need to
> make sure that we don't hold any spinlocks, and miscadmin.h already
> holds things related to "System interrupt and critical section
> handling", which seems quite related.

fixed in v2.

--
Best Regards
Andy Fan

Attachment Content-Type Size
v2-0001-Detect-more-misuse-of-spin-lock-automatically.patch text/x-diff 10.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2024-01-11 01:58:17 Re: Synchronizing slots from primary to standby
Previous Message Sutou Kouhei 2024-01-11 01:24:45 Re: Make COPY format extendable: Extract COPY TO format implementations