Re: the s_lock_stuck on perform_spin_delay

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andy Fan <zhihuifan1213(at)163(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <tmunro(at)postgresql(dot)org>
Subject: Re: the s_lock_stuck on perform_spin_delay
Date: 2024-01-04 13:35:53
Message-ID: CA+Tgmob=RA_y4shbtodwRRhC9uwDSWnBHY=3EvnJ_txtAWoD9A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 4, 2024 at 2:09 AM Andy Fan <zhihuifan1213(at)163(dot)com> wrote:
> My question is if someone doesn't obey the rule by mistake (everyone
> can make mistake), shall we PANIC on a production environment? IMO I
> think it can be a WARNING on a production environment and be a stuck
> when 'ifdef USE_ASSERT_CHECKING'.
>
> People may think spin lock may consume too much CPU, but it is not true
> in the discussed scene since perform_spin_delay have pg_usleep in it,
> and the MAX_DELAY_USEC is 1 second and MIN_DELAY_USEC is 0.001s.
>
> I notice this issue actually because of the patch "Cache relation
> sizes?" from Thomas Munro [1], where the latest patch[2] still have the
> following code.
> + sr = smgr_alloc_sr(); <-- HERE a spin lock is hold
> +
> + /* Upgrade to exclusive lock so we can create a mapping. */
> + LWLockAcquire(mapping_lock, LW_EXCLUSIVE); <-- HERE a complex
> operation is needed. it may take a long time.

I'm not sure that the approach this patch takes is correct in detail,
but I kind of agree with you about the overall point. I mean, the idea
of the PANIC is to avoid having the system just sit there in a state
from which it will never recover ... but it can also have the effect
of killing a system that wasn't really dead. I'm not sure what the
best thing to do here is, but it's worth talking about, IMHO.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bertrand Drouvot 2024-01-04 13:54:44 Re: Synchronizing slots from primary to standby
Previous Message Alexander Lakhin 2024-01-04 13:00:01 Re: Add a perl function in Cluster.pm to generate WAL