Re: Sampling-based timing for EXPLAIN ANALYZE

From: Jelte Fennema <me(at)jeltef(dot)nl>
To: Lukas Fittl <lukas(at)fittl(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Sampling-based timing for EXPLAIN ANALYZE
Date: 2023-01-06 09:19:16
Message-ID: CAGECzQTg1vYYo8G2NaNpTC_B4FuO9PZq5a6WyVDiY0gT5i0Tkg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Nice addition! And the code looks pretty straight forward.

The current patch triggers warnings:
https://cirrus-ci.com/task/6016013976731648 Looks like you need to add void
as the argument.

Do you have some performance comparison between TIMING ON and TIMING
SAMPLING?

In InstrStartSampling there's logic to increase/decrease the frequency of
an already existing timer. It's not clear to me when this can occur. I'd
expect sampling frequency to remain constant throughout an explain plan. If
it's indeed needed, I think a code comment would be useful to explain why
this edge case is necessary.

On Fri, 6 Jan 2023 at 09:41, Lukas Fittl <lukas(at)fittl(dot)com> wrote:

> Hi,
>
> Since EXPLAIN ANALYZE with TIMING ON still carries noticeable overhead on
> modern hardware (despite time sources being faster), I'd like to propose a
> new setting EXPLAIN ANALYZE, called "TIMING SAMPLING", as compared to
> TIMING ON.
>
> This new timing mode uses a timer on a fixed recurring frequency (e.g. 100
> or 1000 Hz) to gather a sampled timestamp on a predefined schedule, instead
> of getting the time on-demand when InstrStartNode/InstrStopNode is called.
> To implement the timer, we can use the existing timeout infrastructure,
> which is backed by a wall clock timer (ITIMER_REAL).
>
> Conceptually this is inspired by how sampling profilers work (e.g.
> "perf"), but it ties into the existing per-plan node instrumentation done
> by EXPLAIN ANALYZE, and simply provides a lower accuracy version of the
> total time for each plan node.
>
> In EXPLAIN output this is marked as "sampled time", and scaled to the
> total wall clock time (to adjust for the sampling undercounting):
>
> =# EXPLAIN (ANALYZE, BUFFERS, TIMING SAMPLING, SAMPLEFREQ 100) SELECT ...;
> QUERY PLAN
>
>
> -------------------------------------------------------------------------------------------------------------------------------------
> HashAggregate (cost=201747.90..201748.00 rows=10 width=12) (actual
> sampled time=5490.974 rows=9 loops=1)
> ...
> -> Hash Join (cost=0.23..199247.90 rows=499999 width=4) (actual
> sampled time=3738.619 rows=9000000 loops=1)
> ...
> -> Seq Scan on large (cost=0.00..144247.79 rows=9999979
> width=4) (actual sampled time=1004.671 rows=10000001 loops=1)
> ...
> -> Hash (cost=0.10..0.10 rows=10 width=4) (actual sampled
> time=0.000 rows=10 loops=1)
> ...
> Execution Time: 5491.475 ms
> ---
>
> In simple query tests like this on my local machine, this shows a
> consistent benefit over TIMING ON (and behaves close to ANALYZE with TIMING
> OFF), whilst providing a "good enough" accuracy to identify which part of
> the query was problematic.
>
> Attached is a prototype patch for early feedback on the concept, with
> tests and documentation to come in a follow-up. Since the January
> commitfest is still marked as open I'll register it there, but note that my
> assumption is this is *not* Postgres 16 material.
>
> As an open item, note that in the patch the requested sampling frequency
> is not yet passed to parallel workers (it always defaults to 1000 Hz when
> sampling is enabled). Also, note the timing frequency is limited to a
> maximum of 1000 Hz (1ms) due to current limitations of the timeout
> infrastructure.
>
> With thanks to Andres Freund for help on refining the idea, collaborating
> on early code and finding the approach to hook into the timeout API.
>
> Thanks,
> Lukas
>
> --
> Lukas Fittl
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Erik Rijkers 2023-01-06 09:24:06 convey privileges -> confer privileges
Previous Message Drouvot, Bertrand 2023-01-06 08:44:25 Re: Generate pg_stat_get_xact*() functions with Macros