Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?

From: David Geier <geidav(dot)pg(at)gmail(dot)com>
To: Lukas Fittl <lukas(at)fittl(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Hannu Krosing <hannuk(at)google(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
Date: 2026-02-23 15:24:57
Message-ID: 41528b05-62be-4a5a-abd8-2af2dd84a1be@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Lukas,

Thanks for taking care of incorporating the latest patch feedback.

On 13.02.2026 05:11, Lukas Fittl wrote:
> On Thu, Feb 12, 2026 at 4:41 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>> On 2026-02-12 08:05:27 -0800, Lukas Fittl wrote:
> (1) changing the pg_ticks_to_ns logic to have an explicit
> "ticks_per_ns_scaled == 0" early check and return at the start, and
> setting ticks_per_ns_scaled to 0 when clock_gettime() gets used. This
> is similar to what David already suggested in an earlier email.
> (2) using uint64 for the ticks_per_ns_scaled/max_ticks_no_overflow
> variables - this appears to help GCC generate a bit shift reliably,
> instead of an idiv instruction.
>
> That appears to eliminate the regression in my testing. Attached an
> updated v7, which also has some slightly improved commit messages.
>
> Additional comparisons with the test case you had back at the start of
> this thread, with system clock source on my test VM:
>
> master:
>
> EXPLAIN (ANALYZE, TIMING ON) SELECT count(*) FROM lotsarows;
> Time: 1888.891 ms (best of 3)
> pg_test_timing / Average loop time including overhead: 23.53 ns
>
> v6 (0002 + pg_test_timing prev/cur change):
>
> EXPLAIN (ANALYZE, TIMING ON) SELECT count(*) FROM lotsarows;
> Time: 1897.095 ms (best of 3)
> pg_test_timing / Average loop time including overhead: 25.52 ns
>
> v7 (0002):
>
> EXPLAIN (ANALYZE, TIMING ON) SELECT count(*) FROM lotsarows;
> Time: 1897.148 ms (best of 3)
> Average loop time including overhead: 23.14 ns

Shouldn't that result be better than master because you optimized the
loop overhead in v7-0002? That's at least what I've measured, see test
results below.

> And when looking at the TSC time source with the full patch set on the same VM:
>
> v6:
>
> EXPLAIN (ANALYZE, TIMING ON) SELECT count(*) FROM lotsarows;
> Time: 1477.672 ms (best of 3)
> pg_test_timing / Average loop time including overhead: 11.79 ns
>
> v7:
>
> EXPLAIN (ANALYZE, TIMING ON) SELECT count(*) FROM lotsarows;
> Time: 1476.326 ms (best of 3)
> pg_test_timing / Average loop time including overhead: 11.78 ns
>
> Thanks,
> Lukas
>
> [0]: https://godbolt.org/z/EvK1M66n5
>
> --
> Lukas Fittl

The code wasn't compiling properly on Windows because __x86_64__ is not
defined in Visual C++. I've changed the code to use

#if defined(__x86_64__) || defined(_M_X64)

I've also changed #include <x86intrin.h> to <immintrin.h>.

I've tested v8 of the patch (= v7 plus aforementioned changes) on
Windows. I'm reporting the best of 3 runs.

lotsarows test with parallelism disabled:

master: 2781 ms
v7: 2776 ms (timing_clock_source = 'system')
v7: 2091 ms (timing_clock_source = 'tsc')

pg_test_timing:

master: 27.04 ns
v7: 16.59 ns (QueryxPerformanceCounter)
v7: 13.69 ns (RDTSCP)
v7: 9.42 ns (RDTSC)

v8 of the patch is attached to this mail.

--
David Geier

Attachment Content-Type Size
v8-0004-pg_test_timing-Also-test-RDTSC-RDTSCP-timing-and-.patch text/x-patch 6.1 KB
v8-0003-Timing-Use-Time-Stamp-Counter-TSC-on-x86-64-for-f.patch text/x-patch 24.9 KB
v8-0002-Timing-Streamline-ticks-to-nanosecond-conversion-.patch text/x-patch 13.3 KB
v8-0001-Check-for-HAVE__CPUIDEX-and-HAVE__GET_CPUID_COUNT.patch text/x-patch 6.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dmitry Dolgov 2026-02-23 15:26:44 Re: Add ssl_(supported|shared)_groups to sslinfo
Previous Message Bertrand Drouvot 2026-02-23 15:22:22 Re: Check for memset_explicit() and explicit_memset()