Re: Sampling-based timing for EXPLAIN ANALYZE

From: Andres Freund <andres(at)anarazel(dot)de>
To: David Geier <geidav(dot)pg(at)gmail(dot)com>
Cc: Jelte Fennema <me(at)jeltef(dot)nl>, Lukas Fittl <lukas(at)fittl(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Sampling-based timing for EXPLAIN ANALYZE
Date: 2023-01-15 20:22:12
Message-ID: 20230115202212.oh7qu7oaeyr5mzsz@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-01-13 09:11:06 +0100, David Geier wrote:
> Mostly I'm wondering if the sampling based approach gains us enough to be
> worth it, once the patch to use RDTSC hopefully landed (see [1]).

Well, I'm not sure we have a path forward on it. There's portability and
accuracy concerns. But more importantly:

> I believe that with the RDTSC patch the overhead of TIMING ON is lower than
> the overhead of using ANALYZE with TIMING OFF in the first place. Hence, to
> be really useful, it would be great if we could on top of TIMING SAMPLING
> also lower the overhead of ANALYZE itself further (e.g. by using a fast path
> for the default EXPLAIN (ANALYZE, TIMING ON / SAMPLING)). Currently,
> InstrStartNode() and InstrStopNode() have a ton of branches and without all
> the typically deactivated code the implementation would be very small and
> could be placed in an inlinable function.

Yes, I think SAMPLING could get rid of most of the instrumentation overhead -
at the cost of a bit less detail in the explain, of course. Which could make
it a lot more feasible to enable something like auto_explain.log_timing in
busy workloads.

For the sampling mode we don't really need something like
InstrStart/StopNode. We just need a pointer to node currently executing - not
free to set, but still a heck of a lot cheaper than InstrStopNode(), even
without ->need_timer etc. Then the timer just needs to do
instr->sampled_total += (now - last_sample)
last_sample = now

I've been thinking that we should consider making more of the instrumentation
code work like that. The amount of work we're doing in InstrStart/StopNode()
has steadily risen. When buffer usage and WAL usage are enabled, we're
executing over 1000 instructions! And each single Instrumentation node is ~450
bytes, to a good degree due to having 2 BufUsage and 2 WalUsage structs
embedded.

If we instead have InstrStartNode() set up a global pointer to the
Instrumentation node, we can make the instrumentation code modify both the
"global" counters (pgBufferUsage, pgWalUsage) and, if set,
current_instr->{pgBufferUsage, pgWalUsage}. That'll require some larger
changes - right now nodes "automatically" include the IO/WAL incurred in child
nodes, but that's just a small bit of additional summin-up to be done during
EXPLAIN.

Separately, I think we should consider re-ordering Instrumentation so that
bufusage_start, walusage_start are after the much more commonly used
elements. We're forcing ntuples, nloops, .. onto separate cachelines, even
though they're accounted for unconditionally.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-01-15 20:36:31 Re: logrep stuck with 'ERROR: int2vector has too many elements'
Previous Message Tom Lane 2023-01-15 20:17:16 Re: logrep stuck with 'ERROR: int2vector has too many elements'