Re: the s_lock_stuck on perform_spin_delay

From: Andy Fan <zhihuifan1213(at)163(dot)com>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <tmunro(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: the s_lock_stuck on perform_spin_delay
Date: 2024-01-04 14:24:50
Message-ID: 871qaxp3ly.fsf@163.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hi Matthias and Robert,

Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> writes:

> On Thu, 4 Jan 2024 at 08:09, Andy Fan <zhihuifan1213(at)163(dot)com> wrote:
>>
>> My question is if someone doesn't obey the rule by mistake (everyone
>> can make mistake), shall we PANIC on a production environment? IMO I
>> think it can be a WARNING on a production environment and be a stuck
>> when 'ifdef USE_ASSERT_CHECKING'.
>> [...]
>> I think a experienced engineer like Thomas can make this mistake and the
>> patch was reviewed by 3 peoples, the bug is still there. It is not easy
>> to say just don't do it.
>>
>> the attached code show the prototype in my mind. Any feedback is welcome.
>
> While I understand your point and could maybe agree with the change
> itself (a crash isn't great),

It's great that both of you agree that the crash is not great.

> I don't think it's an appropriate fix
> for the problem of holding a spinlock while waiting for a LwLock, as
> spin.h specifically mentions the following (and you quoted the same):
>
> """
> Keep in mind the coding rule that spinlocks must not be held for more
> than a few instructions.
> """

Yes, I agree that the known [LW]LockAcquire after holding a Spin lock
should be fixed at the first chance rather than pander to it with my
previous patch. My previous patch just take care of the *unknown*
cases (and I cced thomas in the hope that he knows the bug). I also
agree that the special case about [LW]LockAcquire should be detected
more effective as you suggested below. So v2 comes and commit 2 is for
this suggestion.

>
> I suspect that we'd be better off with stronger protections against
> waiting for LwLocks while we hold any spin lock. More specifically,
> I'm thinking about something like tracking how many spin locks we
> hold, and Assert()-ing that we don't hold any such locks when we start
> to wait for an LwLock or run CHECK_FOR_INTERRUPTS-related code (with
> potential manual contextual overrides in specific areas of code where
> specific care has been taken to make it safe to hold spin locks while
> doing those operations - although I consider their existence unlikely
> I can't rule them out as I've yet to go through all lock-touching
> code). This would probably work in a similar manner as
> START_CRIT_SECTION/END_CRIT_SECTION.
>
> Kind regards,
>
> Matthias van de Meent
> Neon (https://neon.tech)

--
Best Regards
Andy Fan

Attachment Content-Type Size
v2-0001-Don-t-call-s_lock_stuck-in-production-environment.patch text/x-diff 1.7 KB
v2-0002-Disallow-LW-LockAcquire-when-holding-a-spin-lock-.patch text/x-diff 3.7 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2024-01-04 14:55:01 Re: index prefetching
Previous Message Bertrand Drouvot 2024-01-04 13:54:44 Re: Synchronizing slots from primary to standby