Re: Sampling-based timing for EXPLAIN ANALYZE

From: Lukas Fittl <lukas(at)fittl(dot)com>
To: Jelte Fennema <me(at)jeltef(dot)nl>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Sampling-based timing for EXPLAIN ANALYZE
Date: 2023-01-17 10:50:40
Message-ID: CAP53Pkw97c7q-8e6BBab4aXS=1ROMtKpnHEeK0RDnPx7yiirog@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 6, 2023 at 1:19 AM Jelte Fennema <me(at)jeltef(dot)nl> wrote:

> Nice addition! And the code looks pretty straight forward.
>

Thanks for reviewing!

The current patch triggers warnings:
> https://cirrus-ci.com/task/6016013976731648 Looks like you need to add
> void as the argument.
>

Fixed in v2 attached. This also adds a simple regression test, as well as
fixes the parallel working handling.

Do you have some performance comparison between TIMING ON and TIMING
> SAMPLING?
>

Here are some benchmarks of auto_explain overhead on my ARM-based M1
Macbook for the following query run with pgbench on a scale factor 100 data
set:

SELECT COUNT(*) FROM pgbench_branches JOIN pgbench_accounts USING (bid)
JOIN pgbench_tellers USING (bid) WHERE bid = 42;

(the motivation is to use a query that is more complex than the standard
pgbench select-only test query)

avg latency (best of 3), -T 300, -c 4, -s 100, shared_buffers 2GB, fsync
off, max_parallel_workers_per_gather 0:

master, log_timing = off: 871 ms (878 / 877 / 871)
patch, log_timing = off: 869 ms (882 / 880 / 869)
patch, log_timing = on: 890 ms (917 / 930 / 890)
patch, log_timing = sampling, samplefreq = 1000: 869 ms (887 / 869 / 894)

Additionally, here is Andres' benchmark from [1], with the sampling option
added:

% psql -Xc 'DROP TABLE IF EXISTS t; CREATE TABLE t AS SELECT * FROM
generate_series(1, 100000) g(i);' postgres && pgbench -n -r -t 100 -f
<(echo -e "SELECT COUNT(*) FROM t;EXPLAIN (ANALYZE, TIMING OFF) SELECT
COUNT(*) FROM t;EXPLAIN (ANALYZE, TIMING SAMPLING) SELECT COUNT(*) FROM
t;EXPLAIN (ANALYZE, TIMING ON) SELECT COUNT(*) FROM t;") postgres |grep '^ '
DROP TABLE
SELECT 100000
3.507 0 SELECT COUNT(*) FROM t;
3.476 0 EXPLAIN (ANALYZE, TIMING OFF) SELECT COUNT(*)
FROM t;
3.576 0 EXPLAIN (ANALYZE, TIMING SAMPLING) SELECT
COUNT(*) FROM t;
5.096 0 EXPLAIN (ANALYZE, TIMING ON) SELECT COUNT(*)
FROM t;

My pg_test_timing data for reference:

% pg_test_timing
Testing timing overhead for 3 seconds.
Per loop time including overhead: 23.65 ns
Histogram of timing durations:
< us % of total count
1 97.64472 123876325
2 2.35421 2986658
4 0.00022 277
8 0.00016 202
16 0.00064 815
32 0.00005 64

In InstrStartSampling there's logic to increase/decrease the frequency of
> an already existing timer. It's not clear to me when this can occur. I'd
> expect sampling frequency to remain constant throughout an explain plan. If
> it's indeed needed, I think a code comment would be useful to explain why
> this edge case is necessary.
>

Clarified in a code comment in v2. This is needed for handling nested
statements which could have different sampling frequencies for each nesting
level, i.e. a function might want to sample it's queries at a higher
frequency than its caller.

Thanks,
Lukas

[1] https://postgr.es/m/20230116213913.4oseovlzvc2674z7%40awork3.anarazel.de

--
Lukas Fittl

Attachment Content-Type Size
v2-0001-Add-TIMING-SAMPLING-option-for-EXPLAIN-ANALYZE.patch application/octet-stream 37.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Takamichi Osumi (Fujitsu) 2023-01-17 11:00:11 RE: Time delayed LR (WAS Re: logical replication restrictions)
Previous Message Drouvot, Bertrand 2023-01-17 10:48:19 Helper functions for wait_for_catchup() in Cluster.pm