Re: Sampling-based timing for EXPLAIN ANALYZE

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: David Geier <geidav(dot)pg(at)gmail(dot)com>, Jelte Fennema <me(at)jeltef(dot)nl>, Lukas Fittl <lukas(at)fittl(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Sampling-based timing for EXPLAIN ANALYZE
Date: 2023-01-17 18:00:02
Message-ID: 837d40f7-45e2-c8b3-981f-44dc5f0f3e62@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/17/23 18:02, Andres Freund wrote:
> Hi,
>
> On 2023-01-17 15:52:07 +0100, Tomas Vondra wrote:
>> I don't understand why we would even use timestamps, in this case? AFAIK
>> "sampling profilers" simply increment a counter for the executing node,
>> and then approximate the time as proportional to the count.
>
> The timer interrupt distances aren't all that evenly spaced, particularly
> under load, and are easily distorted by having to wait for IO, an lwlock ...
>

OK, so the difference is that these events (I/O, lwlocks) may block
signals, and after signals get unblocked we only get a single event for
each signal. Yeah, the timestamp handles that case better.

>
>> That also does not have issues with timestamp "rounding" - considering
>> e.g. sample rate 1000Hz, that's 1ms between samples. And it's quite
>> possible the node completes within 1ms, in which case
>>
>> (now - last_sample)
>>
>> ends up being 0 (assuming I correctly understand the code).
>
> That part should be counting in nanoseconds, I think? Unless I misunderstand
> something?
>

The higher precision does not help, because both values come from the
*sampled* timestamp (i.e. the one updated from the signal handler). So
if the node happens to execute between two signals, the values are going
to be the same, and the difference is 0.

Perhaps for many executions it works out, because some executions will
cross the boundary, and the average will converge to the right value.

> We already compute the timestamp inside timeout.c, but don't yet pass that to
> timeout handlers. I think there's others re-computing timestamps.
>
>
>> And I don't think there's any particularly good way to correct this.
>>
>> It seems ExplainNode() attempts to do some correction, but I doubt
>> that's very reliable, as these fast nodes will have sampled_total=0, so
>> no matter what you multiply this with, it'll still be 0.
>
> That's just the scaling to the "actual time" that you're talking about above,
> no?
>

Maybe, not sure.

>
>>> I've been thinking that we should consider making more of the instrumentation
>>> code work like that. The amount of work we're doing in InstrStart/StopNode()
>>> has steadily risen. When buffer usage and WAL usage are enabled, we're
>>> executing over 1000 instructions! And each single Instrumentation node is ~450
>>> bytes, to a good degree due to having 2 BufUsage and 2 WalUsage structs
>>> embedded.
>>>
>>> If we instead have InstrStartNode() set up a global pointer to the
>>> Instrumentation node, we can make the instrumentation code modify both the
>>> "global" counters (pgBufferUsage, pgWalUsage) and, if set,
>>> current_instr->{pgBufferUsage, pgWalUsage}. That'll require some larger
>>> changes - right now nodes "automatically" include the IO/WAL incurred in child
>>> nodes, but that's just a small bit of additional summin-up to be done during
>>> EXPLAIN.
>>>
>>
>> That's certainly one way to implement that. I wonder if we could make
>> that work without the global pointer, but I can't think of any.
>
> I don't see a realistic way at least. We could pass down an
> "InstrumentationContext" through everything that needs to do IO and WAL. But
> that seems infeasible at this point.
>

Why infeasible?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-01-17 18:02:51 Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
Previous Message Justin Pryzby 2023-01-17 17:35:09 Re: CI and test improvements