Re: Sampling-based timing for EXPLAIN ANALYZE

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: David Geier <geidav(dot)pg(at)gmail(dot)com>, Jelte Fennema <me(at)jeltef(dot)nl>, Lukas Fittl <lukas(at)fittl(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Sampling-based timing for EXPLAIN ANALYZE
Date: 2023-01-17 19:51:39
Message-ID: 9b29c49f-70a4-ee20-b565-6a4cfa0d3c2f@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/17/23 19:46, Andres Freund wrote:
> Hi,
>
> On 2023-01-17 19:00:02 +0100, Tomas Vondra wrote:
>> On 1/17/23 18:02, Andres Freund wrote:
>>> On 2023-01-17 15:52:07 +0100, Tomas Vondra wrote:
>>>> That also does not have issues with timestamp "rounding" - considering
>>>> e.g. sample rate 1000Hz, that's 1ms between samples. And it's quite
>>>> possible the node completes within 1ms, in which case
>>>>
>>>> (now - last_sample)
>>>>
>>>> ends up being 0 (assuming I correctly understand the code).
>>>
>>> That part should be counting in nanoseconds, I think? Unless I misunderstand
>>> something?
>>>
>>
>> The higher precision does not help, because both values come from the
>> *sampled* timestamp (i.e. the one updated from the signal handler). So
>> if the node happens to execute between two signals, the values are going
>> to be the same, and the difference is 0.
>
> In that case there simply wasn't any sample for the node, and a non-timestamp
> based sample counter wouldn't do anything different?
>

Yeah, you're right.

> If you're worried about the case where a timer does fire during execution of
> the node, but exactly once, that should provide a difference between the last
> sampled timestamp and the current time. It'll attribute a bit too much to the
> in-progress nodes, but well, that's sampling for you.
>
>
> I think a "hybrid" explain mode might be worth thinking about. Use the
> "current" sampling method for the first execution of a node, and for the first
> few milliseconds of a query (or perhaps the first few timestamp
> acquisitions). That provides an accurate explain analyze for short queries,
> without a significant slowdown. Then switch to sampling, which provides decent
> attribution for a bit longer running queries.
>

Yeah, this is essentially the sampling I imagined when I first read the
subject of this thread. It samples which node executions to measure (and
then measures those accurately), while these patches sample timestamps.

>
>
>>>>> I've been thinking that we should consider making more of the instrumentation
>>>>> code work like that. The amount of work we're doing in InstrStart/StopNode()
>>>>> has steadily risen. When buffer usage and WAL usage are enabled, we're
>>>>> executing over 1000 instructions! And each single Instrumentation node is ~450
>>>>> bytes, to a good degree due to having 2 BufUsage and 2 WalUsage structs
>>>>> embedded.
>>>>>
>>>>> If we instead have InstrStartNode() set up a global pointer to the
>>>>> Instrumentation node, we can make the instrumentation code modify both the
>>>>> "global" counters (pgBufferUsage, pgWalUsage) and, if set,
>>>>> current_instr->{pgBufferUsage, pgWalUsage}. That'll require some larger
>>>>> changes - right now nodes "automatically" include the IO/WAL incurred in child
>>>>> nodes, but that's just a small bit of additional summin-up to be done during
>>>>> EXPLAIN.
>>>>>
>>>>
>>>> That's certainly one way to implement that. I wonder if we could make
>>>> that work without the global pointer, but I can't think of any.
>>>
>>> I don't see a realistic way at least. We could pass down an
>>> "InstrumentationContext" through everything that needs to do IO and WAL. But
>>> that seems infeasible at this point.
>
>> Why infeasible?
>
> Primarily the scale of the change. We'd have to pass down the context into all
> table/index AM functions. And into a lot of bufmgr.c, xlog.c functions,
> which'd require their callers to have access to the context. That's hundreds
> if not thousands places.
>
> Adding that many function parameters might turn out to be noticable runtime
> wise, due to increased register movement. I think for a number of the places
> where we currently don't, we ought to use by-reference struct for the
> not-always-used parameters, that then also could contain this context.
>

OK, I haven't realized we'd have to pass it to that many places.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-01-17 19:56:42 Re: CI and test improvements
Previous Message Tomas Vondra 2023-01-17 19:44:36 Re: Progress report of CREATE INDEX for nested partitioned tables