Re: [PATCH] LWLock self-deadlock detection

From: Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] LWLock self-deadlock detection
Date: 2020-11-25 06:16:53
Message-ID: CAGRY4nyb6iOhPMphZq=9sTUFep35f2JqF=G35mUoJ_3zdoPx1g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 24, 2020 at 10:11 PM Ashutosh Bapat <
ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:

> This looks useful. LWLockCheckSelfDeadlock() could use LWLockHeldByMe
> variant instead of copying that code with possibly a change in that
> function to return the required information.
>

Yes, possibly so. I was reluctant to mess with the rest of the code too
much.

I am also seeing a pattern
> Assert(!LWLockHeldByMe());
> LWLockAcquire()
>
> at some places. Should we change LWLockAcquire to do
> Assert(!LWLockHeldByMe()) always to detect such occurrences?

I'm inclined not to, at least not without benchmarking it, because that'd
do the check before we attempt the fast-path. cassert builds are still
supposed to perform decently and be suitable for day to day development and
I'd rather not risk a slowdown.

I'd prefer to make the lock self deadlock check run for production builds,
not just cassert builds. It'd print a normal LOG (with backtrace if
supported) then Assert(). So on an assert build we'd get a crash and core,
and on a non-assert build we'd carry on and self-deadlock anyway.

That's probably the safest thing to do. We can't expect to safely ERROR out
of the middle of an LWLockAcquire(), not without introducing a new and
really hard to test code path that'll also be surprising to callers. We
probably don't want to PANIC the whole server for non-assert builds since
it might enter a panic-loop. So it's probably better to self-deadlock. We
could HINT that a -m immediate shutdown will be necessary to recover though.

I don't think it makes sense to introduce much complexity for a feature
that's mainly there to help developers and to catch corner-case bugs.

(FWIW a variant of this patch has been in 2ndQPostgres for some time. It
helped catch an extension bug just the other day.)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2020-11-25 06:19:56 Re: [Proposal] Global temporary tables
Previous Message Amit Kapila 2020-11-25 05:45:38 Re: Remove cache_plan argument comments to ri_PlanCheck