Quick Links

Re: Sampling-based timing for EXPLAIN ANALYZE

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc:	David Geier <geidav(dot)pg(at)gmail(dot)com>, Jelte Fennema <me(at)jeltef(dot)nl>, Lukas Fittl <lukas(at)fittl(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Sampling-based timing for EXPLAIN ANALYZE
Date:	2023-01-17 18:46:25
Message-ID:	20230117184625.ojwn5yznsmn3tlm7@awork3.anarazel.de
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

On 2023-01-17 19:00:02 +0100, Tomas Vondra wrote:
> On 1/17/23 18:02, Andres Freund wrote:
> > On 2023-01-17 15:52:07 +0100, Tomas Vondra wrote:
> >> That also does not have issues with timestamp "rounding" - considering
> >> e.g. sample rate 1000Hz, that's 1ms between samples. And it's quite
> >> possible the node completes within 1ms, in which case
> >>
> >> (now - last_sample)
> >>
> >> ends up being 0 (assuming I correctly understand the code).
> >
> > That part should be counting in nanoseconds, I think? Unless I misunderstand
> > something?
> >
>
> The higher precision does not help, because both values come from the
> *sampled* timestamp (i.e. the one updated from the signal handler). So
> if the node happens to execute between two signals, the values are going
> to be the same, and the difference is 0.

In that case there simply wasn't any sample for the node, and a non-timestamp
based sample counter wouldn't do anything different?

If you're worried about the case where a timer does fire during execution of
the node, but exactly once, that should provide a difference between the last
sampled timestamp and the current time. It'll attribute a bit too much to the
in-progress nodes, but well, that's sampling for you.

I think a "hybrid" explain mode might be worth thinking about. Use the
"current" sampling method for the first execution of a node, and for the first
few milliseconds of a query (or perhaps the first few timestamp
acquisitions). That provides an accurate explain analyze for short queries,
without a significant slowdown. Then switch to sampling, which provides decent
attribution for a bit longer running queries.

> >>> I've been thinking that we should consider making more of the instrumentation
> >>> code work like that. The amount of work we're doing in InstrStart/StopNode()
> >>> has steadily risen. When buffer usage and WAL usage are enabled, we're
> >>> executing over 1000 instructions! And each single Instrumentation node is ~450
> >>> bytes, to a good degree due to having 2 BufUsage and 2 WalUsage structs
> >>> embedded.
> >>>
> >>> If we instead have InstrStartNode() set up a global pointer to the
> >>> Instrumentation node, we can make the instrumentation code modify both the
> >>> "global" counters (pgBufferUsage, pgWalUsage) and, if set,
> >>> current_instr->{pgBufferUsage, pgWalUsage}. That'll require some larger
> >>> changes - right now nodes "automatically" include the IO/WAL incurred in child
> >>> nodes, but that's just a small bit of additional summin-up to be done during
> >>> EXPLAIN.
> >>>
> >>
> >> That's certainly one way to implement that. I wonder if we could make
> >> that work without the global pointer, but I can't think of any.
> >
> > I don't see a realistic way at least. We could pass down an
> > "InstrumentationContext" through everything that needs to do IO and WAL. But
> > that seems infeasible at this point.

> Why infeasible?

Primarily the scale of the change. We'd have to pass down the context into all
table/index AM functions. And into a lot of bufmgr.c, xlog.c functions,
which'd require their callers to have access to the context. That's hundreds
if not thousands places.

Adding that many function parameters might turn out to be noticable runtime
wise, due to increased register movement. I think for a number of the places
where we currently don't, we ought to use by-reference struct for the
not-always-used parameters, that then also could contain this context.

Greetings,

Andres Freund

In response to

Re: Sampling-based timing for EXPLAIN ANALYZE at 2023-01-17 18:00:02 from Tomas Vondra

Responses

Re: Sampling-based timing for EXPLAIN ANALYZE at 2023-01-17 19:51:39 from Tomas Vondra

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2023-01-17 18:50:53	Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
Previous Message	Nathan Bossart	2023-01-17 18:42:30	Re: almost-super-user problems that we haven't fixed yet