Re: [PATCH] Identify LWLocks in tracepoints

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Identify LWLocks in tracepoints
Date: 2021-04-13 20:46:25
Message-ID: 20210413204625.aybkqnuhzcny3mdb@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2021-04-13 14:25:23 -0400, Robert Haas wrote:
> On Mon, Apr 12, 2021 at 11:06 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> You could identify every lock by a tranche ID + an array offset + a
> "tranche instance ID". But where would you store the tranche instance
> ID to make it readily accessible, other than in the lock itself?
> Andres wasn't thrilled about using even 2 bytes to identify the
> LWLock, so he'll probably like having more bytes in there for that
> kind of thing even less.

I still don't like the two bytes, fwiw ;). Especially because it's 4
bytes due to padding right now.

I'd like to move the LWLock->waiters list to outside of the lwlock
itself - at most TotalProcs LWLocks can be waited for, so we don't need
millions of empty proclist_heads. That way we could also remove the
proclist indirection - which shows up a fair bit in contended workloads.

And if we had a separate "lwlocks being waited for" structure, we could
also add more information to it if we wanted to...

The difficulty of course is having space to indicate which of these
"waiting for" lists are being used - there's not enough space in ->state
right now to represent that. Two possibile approaches:

- We could make it work if we restricted MAX_BACKENDS to be 2**14 - but
while I personally think that's a sane upper limit, I already had a
hard time getting consensus to lower the limit to 2^18-1.

- Use a 64bit integer. Then we can easily fit MAX_BACKENDS lockers, as
well as an offset to one of MAX_BACKENDS "wait lists" into LWLock.

It's not so much that I want to lower the overall memory usage (although
it doesn't hurt). It's more about being able to fit more data into one
cacheline together with the lwlock. E.g. being able to fit more into
BufferDesc would be very useful.

A secondary benefit of such an approach would be that it it makes it a
lot easier to add efficient adaptive spinning on contended locks. I did
experiment with that, and there's some considerable potential for
performance benefits there. But for it to scale well we need something
similar to "mcs locks", to avoid causing too much contention. And that
pretty much requires some separate space to store wait information
anyway.

With an 8 bytes state we probably could also stash the tranche inside
that...

> On a broader level, I agree that being able to find out what the
> system is doing is really important. But I'm also not entirely
> convinced that having really fine-grained information here to
> distinguish between one lock and another is the way to get there.
> Personally, I've never run into a problem where I really needed to
> know anything more than the tranche name.

I think it's quite useful for relatively simple things like analyzing
the total amount of time spent in individual locks, without incuring
much overhead when not doing so (for which you need to identify
individual locks, otherwise your end - start time is going to be
meaningless). And, slightly more advanced, for analyzing what the stack
was when the lock was released - which then allows you to see what work
you're blocked on, something I found hard to figure out otherwise.

I found that that's mostly quite doable with dynamic probes though.

> Like, I've seen problems for example we can see that there's a lot of
> contention on SubtransSLRULock, or there's problems with
> WALInsertLock. But I can't really see why I'd need to know which
> WALInsertLock was experiencing contention.

Well, but you might want to know what the task blocking you was
doing. What to optimize might differ if the other task is e.g. a log
switch (which acquires all insert locks), than if it's WAL writes by
VACUUM.

> If we were speaking of buffer content locks, I suppose I can imagine
> wanting more details, but it's not really the buffer number I'd want
> to know. I'd want to know the database OID, the relfilenode, the fork
> number, and the block number. You can argue that we should just expose
> the buffer number and let the user sort out the rest with
> dtrace/systemtap magic, but that makes it useless in practice to an
> awful lot of people, including me.

I have wondered if we ought to put some utilities for that in contrib or
such. It's a lot easier to address something new with a decent starting
point...

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jehan-Guillaume de Rorthais 2021-04-13 20:57:40 Re: Retry in pgbench
Previous Message Peter Geoghegan 2021-04-13 19:59:03 Re: New IndexAM API controlling index vacuum strategies