Re: [PATCH] LWLock self-deadlock detection

From: Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: ashutosh(dot)bapat(at)enterprisedb(dot)com, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] LWLock self-deadlock detection
Date: 2020-11-26 02:19:00
Message-ID: CAGRY4nw7fT+=shjvFLVFFUfDv=q1eZ0iim6rjwjSqdNF2ENp+Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 25, 2020 at 9:23 PM Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
wrote:

> On Wed, Nov 25, 2020 at 11:47 AM Craig Ringer
> <craig(dot)ringer(at)enterprisedb(dot)com> wrote:
> >> I am also seeing a pattern
> >> Assert(!LWLockHeldByMe());
> >> LWLockAcquire()
> >>
> >> at some places. Should we change LWLockAcquire to do
> >> Assert(!LWLockHeldByMe()) always to detect such occurrences?
> >
> >
> > I'm inclined not to, at least not without benchmarking it, because
> that'd do the check before we attempt the fast-path. cassert builds are
> still supposed to perform decently and be suitable for day to day
> development and I'd rather not risk a slowdown.
> >
> > I'd prefer to make the lock self deadlock check run for production
> builds, not just cassert builds. It'd print a normal LOG (with backtrace if
> supported) then Assert(). So on an assert build we'd get a crash and core,
> and on a non-assert build we'd carry on and self-deadlock anyway.
> >
> > That's probably the safest thing to do. We can't expect to safely ERROR
> out of the middle of an LWLockAcquire(), not without introducing a new and
> really hard to test code path that'll also be surprising to callers. We
> probably don't want to PANIC the whole server for non-assert builds since
> it might enter a panic-loop. So it's probably better to self-deadlock. We
> could HINT that a -m immediate shutdown will be necessary to recover though.
>
> I agree that it will be helpful to print something in the logs
> indicating the reason for this hang in case the hang happens in a
> production build. In your patch you have used ereport(PANIC, ) which
> may simply be replaced by an Assert() in an assert enabled build.

I'd either select between PANIC (assert build) and LOG (non assert build),
or always LOG then put an Assert() afterwards. It's strongly desirable to
get the log output before any crash because it's a pain to get the LWLock
info from a core.

> We
> already have Assert(!LWLockHeldByMe()) so that should be safe. It will
> be good to have -m immediate hint in LOG message. But it might just be
> better to kill -9 that process to get rid of it. That will cause the
> server to restart and not just shutdown.
>

Sure. Won't work on Windows though. And I am hesitant about telling users
to "kill -9" anything.

pg_ctl -m immediate restart then.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message tsunakawa.takay@fujitsu.com 2020-11-26 02:42:09 RE: [POC] Fast COPY FROM command for the table with foreign partitions
Previous Message Hou, Zhijie 2020-11-26 02:17:21 RE: Parallel Inserts in CREATE TABLE AS