Re: Add LWLock blocker(s) information

From: Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Add LWLock blocker(s) information
Date: 2020-11-19 10:33:49
Message-ID: CAGRY4nz=SEs3qc1R6xD3max7sg3kS-L81eJk2aLUWSQAeAFJTA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 18, 2020 at 5:25 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:

> On 11/08/2020 03:41, Andres Freund wrote:
> > On 2020-08-10 18:27:17 -0400, Robert Haas wrote:
> >> On Tue, Jun 2, 2020 at 8:25 AM Drouvot, Bertrand <bdrouvot(at)amazon(dot)com>
> wrote:
> >>> the patch adds into the LWLock struct:
> >>>
> >>> last_holding_pid: last pid owner of the lock
> >>> last_mode: last holding mode of the last pid
> owner of the lock
> >>> nholders: number of holders (could be >1 in case
> of LW_SHARED)
> >>
> >> There's been significant work done over the years to get the size of
> >> an LWLock down; I'm not very enthusiastic about making it bigger
> >> again. See for example commit 6150a1b08a9fe7ead2b25240be46dddeae9d98e1
> >> which embeds one of the LWLocks associated with a BufferDesc into the
> >> structure to reduce the number of cache lines associated with common
> >> buffer operations. I'm not sure whether this patch would increase the
> >> space usage of a BufferDesc to more than one cache line again, but at
> >> the very least it would make it a lot tighter, since it looks like it
> >> adds 12 bytes to the size of each one.
> >
> > +many. If anything I would like to make them *smaller*. We should strive
> > to make locking more and more granular, and that requires the space
> > overhead to be small. I'm unhappy enough about the tranche being in
> > there, and requiring padding etc.
> >
> > I spent a *LOT* of sweat getting where we are, I'd be unhappy to regress
> > on size or efficiency.
>
> That seems to be the consensus, so I'm marking this as Returned with
> Feeback in the commitfest.
>

For what it's worth, I think that things like this are where we can really
benefit from external tracing and observation tools.

Instead of tracking the information persistently in the LWLock struct, we
can emit TRACE_POSTGRESQL_LWLOCK_BLOCKED_ON(...) in a context where we have
the information available to us, then forget all about it. We don't spend
anything unless someone's collecting the info.

If someone wants to track LWLock blocking relationships during debugging
and performance work, they can use systemtap, dtrace, bpftrace, or a
similar tool to observe the LWLock tracepoints and generate stats on LWLock
blocking frequencies/durations. Doing so with systemtap should be rather
simple.

I actually already had a look at this before. I found that the tracepoints
that're in the LWLock code right now don't supply enough information in
their arguments so you have to use DWARF debuginfo based probes, which is a
pain. The tranche name alone doesn't let you identify which lock within a
tranche is the current target.

I've attached a patch that adds the actual LWLock* to each tracepoint in
the LWLock subsystem. That permits different locks to be tracked when
handling tracepoint events within a single process.

Another patch adds tracepoints that were missing from LWLockUpdateVar and
LWLockWaitForVar. And another removes a stray
TRACE_POSTGRESQL_LWLOCK_ACQUIRE() in LWLockWaitForVar() which should not
have been there, since the lock is not actually acquired by
LWLockWaitForVar().

I'd hoped to add some sort of "index within the tranche" to tracepoints,
but it looks like it's not feasible. It turns out to be basically
impossible to get a stable identifier for an individual LWLock that is
valid across different backends. A LWLock inside a DSM segment might have a
different address in different backends depending on where the DSM segment
got mapped. The LWLock subsystem doesn't keep track of them and doesn't
have any way to map a LWLock pointer to any sort of cross-process-valid
identifier. So that's a bit of a pain when tracing. To provide something
stable I think it'd be necessary to add some kind of counter tracked
per-tranche and set by LWLockInitialize in the LWLock struct itself, which
we sure don't want to do. If this ever becomes important for some reason we
can probably look up whether the address is within a DSM segment or static
shmem and compute some kind of relative address to report. For now you
can't identify and compare individual locks within a tranche except for
individual locks and named tranches.

By the way, the LWLock tracepoints currently fire T_NAME(lock) which calls
GetLWTrancheName() for each tracepoint hit, so long as Pg is built with
--enable-dtrace, even when nothing is actually tracing them. We might want
to consider guarding them in systemtap tracepoint semaphore tests so they
just become a predicted-away branch when not active. Doing so requires a
small change to how we compile probes.d and the related makefile, but
shouldn't be too hard. I haven't done that in this patch set.

Attachment Content-Type Size
v1-0002-Pass-the-target-LWLock-and-tranche-ID-to-LWLock-t.patch text/x-patch 7.2 KB
v1-0001-Remove-bogus-lwlock__acquire-tracepoint-from-LWLo.patch text/x-patch 1002 bytes
v1-0003-Add-to-the-tracepoints-in-LWLock-routines.patch text/x-patch 5.7 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2020-11-19 10:38:51 Re: Deduplicate aggregates and transition functions in planner
Previous Message Craig Ringer 2020-11-19 10:31:36 [PATCH] LWLock self-deadlock detection