Re: [PATCH] Identify LWLocks in tracepoints

From: Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Robert Haas <robert(dot)haas(at)enterprisedb(dot)com>
Cc: David Steele <david(at)pgmasters(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: [PATCH] Identify LWLocks in tracepoints
Date: 2021-04-12 06:31:32
Message-ID: CAGRY4ny7NGwvq1Ntb_dTaxD0i5OHz1nA2RbezxU9Suu8R+4RaQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 22 Mar 2021 at 16:38, Peter Eisentraut <
peter(dot)eisentraut(at)enterprisedb(dot)com> wrote:

>
> First, a problem: 0002 doesn't build on macOS, because uint64 has been
> used in the probe definitions. That needs to be handled like the other
> nonnative types in that file.
>

Will fix.

All the probe changes and additions should be accompanied by
> documentation changes.
>

Agreed, will fix.

The probes used to have an argument to identify the lock, which was
> removed by 3761fe3c20bb040b15f0e8da58d824631da00caa.

Huh. That's exactly the functionality I was looking for. Damn. I understand
why Robert removed it, but its removal makes it much harder to identify an
LWLock since it might fall in a DSM segment that could be mapped at
different base addresses in different backends.

Robert's patch didn't replace the offset within tranche with anything else
to identify the lock. A LWLock* is imperfect due to ASLR and DSM but it's
better than nothing. In theory we could even remap them in trace tools if
we had tracepoints on DSM attach and detach that showed their size and base
address too.

CC'ing Andres, as he expressed interest in being able to globally identify
LWLocks too.

> The 0001 patch is
> essentially trying to reinstate that, which seems sensible. Perhaps we
> should also use the argument order that used to be there. It used to be
>
> probe lwlock__acquire(const char *, int, LWLockMode);
>
> and now it would be
>
> probe lwlock__acquire(const char *, LWLockMode, LWLock*, int);
>
> Also, do we need both the tranche name and the tranche id?

Reasons to have the name:

* There is no easy way to look up the tranche name by ID from outside the
backend
* A tranche ID by itself is pretty much meaningless especially for dynamic
tranches
* Any existing scripts will rely on the tranche name

So the tranche name is really required to generate useful output for any
dynamic tranches, or simple and readable output from things like perf.

Reasons to have the tranche ID:

* The tranche name is not guaranteed to have the same address for a given
value across backends in the presence of ASLR, even for built-in tranches.
So tools need to read tranche names as user-space strings, which is much
more expensive than consuming an int argument from the trace args. Storing
and reporting maps of events by tranche name (string) in tools is also more
expensive than having a tranche id.
* When the trace tool or script wants to filter for only one particular
tranche,particularly when it's a built-in tranche where the tranche ID is
known, having the ID is much more useful and efficient.
* If we can avoid computing the tranche name, emitting just the tranche ID
would be much faster.

It's annoying that we have to pay the cost of computing the tranche name
though. It never used to matter, but now that T_NAME() expands to
GetLWTrancheName() calls as of 29c3e2dd5a6 it's going to cost a little more
on such a hot path. I might see if I can do a little comparison and see how
much.

I could add TRACE_POSTGRESQL_<<tracepointname>>_ENABLED() guards since we
do in fact build with SDT semaphore support. That adds a branch for each
tracepoint, but they're already marshalling arguments and making a function
call that does lots more than a single branch, so that seems pretty
sensible. The main downside of using _ENABLED() USDT semaphore guards is
that not all tools are guaranteed to understand or support them. So an
older perf, for example, might simply fail to fire events on guarded
probes. That seems OK to me, the onus should be on the probe tool to pay
any costs, not on PostgreSQL. Despite that I don't want to mark the
_ENABLED() guards unlikely(), since that'd increase the observer effect
where probing LWLocks changes their timing and behaviour. Branch prediction
should do a very good job in such cases without being forced.

I wonder a little about the possible cache costs of the _ENABLED() macros
though. Their data is in a separate ELF segment and separate .o, with no
locality to the traced code. It might be worth checking that before
proceeding; I guess it's even possible that the GetLWTrancheName() calls
could be cheaper. Will see if I can run some checks and report back.

BTW, if you want some of the details on how userspace SDTs work,
https://leezhenghui.github.io/linux/2019/03/05/exploring-usdt-on-linux.html
is interesting and useful. It helps explain uprobes, ftrace, bcc, etc.

Or maybe we
> don't need the name, or can record it differently, which might also
> address your other concern that it's too expensive to compute. In any
> case, I think an argument order like
>
> probe lwlock__acquite(const char *, int, LWLock*, LWLockMode);
>
> would make more sense.
>

OK.

In 0004, you add a probe to record the application_name setting? Would
> there be any value in making that a generic probe that can record any
> GUC change?
>

Yes, there would, but I didn't want to go and do that in the same patch,
and a named probe on application_name is useful separately to having probes
on any GUC.

There's value in having a probe with an easily targeted name that probes
the application_name since it's of obvious interest and utility to probing
and tracing tools. A probe specifically on application_name means a probing
script doesn't have to fire an event for every GUC, copy the GUC name
string, strcmp() it to see if it's the GUC of interest, etc. So specific
probes on "major" GUCs like this are IMO very useful.

(It'd be possible to instead generate probes for each GUC at compile-time
using the preprocessor and the DTRACE_ macros. But as noted above, that
doesn't currently work properly in the same compilation unit that a dtrace
script-generated probes.h is included in. I think it's probably nicer to
have specific probes for GUCs of high interest, then generic probes that
capture all GUCs anyway.)

There are a TON of probes I want to add, and I have a tree full of them
waiting to submit progressively. Yes, ability to probe all GUCs is in
there. So is detail on walsender, reorder buffer, and snapshot builder
activity. Heavyweight lock SDTs. A probe that identifies the backend type
at startup. SDT probe events emitted for every wait-event. Probes in elog.c
to let probes observe error unwinding, capture error messages, etc. (Those
can also be used with systemtap guru mode scripts to do things like turn a
particular elog(DEBUG) into a PANIC at runtime for diagnostic purposes).
Probes in shm_mq to observe message passing and blocking. A probe that
fires whenever debug_query_string changes. Lots. But I can't submit them
all at once, especially without some supporting use cases and scripts that
other people can use so they can understand why these probes are useful.

So I figured I'd start here...

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-04-12 06:32:02 Re: missing documentation for streaming in-progress transactions
Previous Message Masahiko Sawada 2021-04-12 06:20:41 Re: Performance degradation of REFRESH MATERIALIZED VIEW