Re: [PATCH] Identify LWLocks in tracepoints

From: Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Identify LWLocks in tracepoints
Date: 2021-04-14 02:41:44
Message-ID: CAGRY4nxRGeH31d977+9LbbA+tRoSdU-EGn0mfQ6kMkQmZ3prUg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 14 Apr 2021 at 02:25, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> So before the commit in question --
> 3761fe3c20bb040b15f0e8da58d824631da00caa -- T_ID() used to compute an
> offset for a lock within the tranche that was supposed to uniquely
> identify the lock. However, the whole idea of an array per tranche
> turns out to be broken by design.

Yeah, I understand that.

I'd really love it if a committer could add an explanatory comment or
two in the area though. I'm happy to draft a comment patch if anyone
agrees my suggestion is sensible. The key things I needed to know when
studying the code were:

* A LWLock* is always part of a tranche, but locks within a given
tranche are not necessarily co-located in memory, cross referenced or
associated in any way.
* A LWLock tranche does not track how many LWLocks are in the tranche
or where they are in memory. It only groups up LWLocks into categories
and maps the tranche ID to a name.
* Not all LWLocks are part of the main LWLock array; others can be
embedded in shmem structs elsewhere, including in DSM segments.
* LWLocks in DSM segments may not have the same address between
different backends, because the DSM base address can vary, so a
LWLock* cannot be reliably compared between backends unless you know
it's in the main LWLock array or in static shmem.

Having that info in lwlock.c near the tranche management code or the
tranche and main lwlock arrays would've been very handy.

> You could try to identify locks by pointer addresses, but that's got
> security hazards and the addreses aren't portable across all the
> backends involved in the parallel query because of how DSM works, so
> it's not really that helpful in terms of matching stuff up.

What I'm doing now is identifying them by LWLock* across backends. I
keep track of DSM segment mappings in each backend inside the trace
script and I relocate LWLock* pointers known to be inside DSM segments
relative to a dummy base address so they're equal across backends.

It's a real pain, but it works. The main downside is that the trace
script has to observe the DSM attach; if it's started once a backend
already has the DSM segment attached, it has no idea the LWLock is in
a DSM segment or how to remap it. But that's not a serious issue.

> On a broader level, I agree that being able to find out what the
> system is doing is really important. But I'm also not entirely
> convinced that having really fine-grained information here to
> distinguish between one lock and another is the way to get there.

At the start of this thread I would've disagreed with you.

But yeah, you and Andres are right, because the costs outweigh the
benefits here.

I'm actually inclined to revise the patch I sent in order to *remove*
the LWLock name from the tracepoint argument. At least for the
fast-path tracepoints on start-of-acquire and end-of-acquire. I think
it's probably OK to report it in the lock wait tracepoints, which is
where it's most important to have anyway. So the tracepoint will
always report the LWLock* and tranche ID, but it won't report the
tranche name for the fast-path. I'll add trace events for tranche ID
registration, so trace tools can either remember the tranche ID->name
mappings from there, or capture them from lock wait events and
remember them.

Reasonable? That way we retain the most important trace functionality,
but we reduce the overheads.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2021-04-14 02:45:16 Re: [PATCH] Identify LWLocks in tracepoints
Previous Message Japin Li 2021-04-14 02:38:13 Re: Truncate in synchronous logical replication failed