Re: Add LWLock blocker(s) information

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add LWLock blocker(s) information
Date: 2020-08-11 00:41:36
Message-ID: 20200811004136.6aydiib3ew4rcimj@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2020-08-10 18:27:17 -0400, Robert Haas wrote:
> On Tue, Jun 2, 2020 at 8:25 AM Drouvot, Bertrand <bdrouvot(at)amazon(dot)com> wrote:
> > the patch adds into the LWLock struct:
> >
> > last_holding_pid: last pid owner of the lock
> > last_mode: last holding mode of the last pid owner of the lock
> > nholders: number of holders (could be >1 in case of LW_SHARED)
>
> There's been significant work done over the years to get the size of
> an LWLock down; I'm not very enthusiastic about making it bigger
> again. See for example commit 6150a1b08a9fe7ead2b25240be46dddeae9d98e1
> which embeds one of the LWLocks associated with a BufferDesc into the
> structure to reduce the number of cache lines associated with common
> buffer operations. I'm not sure whether this patch would increase the
> space usage of a BufferDesc to more than one cache line again, but at
> the very least it would make it a lot tighter, since it looks like it
> adds 12 bytes to the size of each one.

+many. If anything I would like to make them *smaller*. We should strive
to make locking more and more granular, and that requires the space
overhead to be small. I'm unhappy enough about the tranche being in
there, and requiring padding etc.

I spent a *LOT* of sweat getting where we are, I'd be unhappy to regress
on size or efficiency.

> It's also a little hard to believe that this doesn't hurt performance
> on workloads with a lot of LWLock contention, although maybe not; it
> doesn't seem crazy expensive, just possibly enough to matter.

Yea.

> I thought a little bit about what this might buy as compared with just
> sampling wait events. That by itself is enough to tell you which
> LWLocks are heavily contended. It doesn't tell you what they are
> contending against, so this would be superior in that regard. However,
> I wonder how much of a problem that actually is. Typically, LWLocks
> aren't being taken for long periods, so all the things that are
> accessing the lock spend some time waiting (which you will see via
> wait events in pg_stat_activity) and some time holding the lock
> (making you see other things in pg_stat_activity). It's possible to
> have cases where this isn't true; e.g. a relatively small number of
> backends committing transactions could be slowing down a much larger
> number of backends taking snapshots, and you'd mostly only see the
> latter waiting for ProcArrayLock. However, those kinds of cases don't
> seem super-common or super-difficult to figure out.

Most of the cases where this kind of information really is interesting
seem to benefit a lot from having stack information available. That
obviously has overhead, so we don't want the cost all the
time. The script at
https://postgr.es/m/20170622210845.d2hsbqv6rxu2tiye%40alap3.anarazel.de
can give you results like e.g.
https://anarazel.de/t/2017-06-22/pgsemwait_64_async.svg

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-08-11 00:46:02 Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks
Previous Message Tom Lane 2020-08-11 00:37:46 Re: remove spurious CREATE INDEX CONCURRENTLY wait