| From: | David Geier <geidav(dot)pg(at)gmail(dot)com> |
|---|---|
| To: | Lukas Fittl <lukas(at)fittl(dot)com>, Andres Freund <andres(at)anarazel(dot)de> |
| Cc: | Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Hannu Krosing <hannuk(at)google(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc? |
| Date: | 2026-02-23 15:24:57 |
| Message-ID: | 41528b05-62be-4a5a-abd8-2af2dd84a1be@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Lukas,
Thanks for taking care of incorporating the latest patch feedback.
On 13.02.2026 05:11, Lukas Fittl wrote:
> On Thu, Feb 12, 2026 at 4:41 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>> On 2026-02-12 08:05:27 -0800, Lukas Fittl wrote:
> (1) changing the pg_ticks_to_ns logic to have an explicit
> "ticks_per_ns_scaled == 0" early check and return at the start, and
> setting ticks_per_ns_scaled to 0 when clock_gettime() gets used. This
> is similar to what David already suggested in an earlier email.
> (2) using uint64 for the ticks_per_ns_scaled/max_ticks_no_overflow
> variables - this appears to help GCC generate a bit shift reliably,
> instead of an idiv instruction.
>
> That appears to eliminate the regression in my testing. Attached an
> updated v7, which also has some slightly improved commit messages.
>
> Additional comparisons with the test case you had back at the start of
> this thread, with system clock source on my test VM:
>
> master:
>
> EXPLAIN (ANALYZE, TIMING ON) SELECT count(*) FROM lotsarows;
> Time: 1888.891 ms (best of 3)
> pg_test_timing / Average loop time including overhead: 23.53 ns
>
> v6 (0002 + pg_test_timing prev/cur change):
>
> EXPLAIN (ANALYZE, TIMING ON) SELECT count(*) FROM lotsarows;
> Time: 1897.095 ms (best of 3)
> pg_test_timing / Average loop time including overhead: 25.52 ns
>
> v7 (0002):
>
> EXPLAIN (ANALYZE, TIMING ON) SELECT count(*) FROM lotsarows;
> Time: 1897.148 ms (best of 3)
> Average loop time including overhead: 23.14 ns
Shouldn't that result be better than master because you optimized the
loop overhead in v7-0002? That's at least what I've measured, see test
results below.
> And when looking at the TSC time source with the full patch set on the same VM:
>
> v6:
>
> EXPLAIN (ANALYZE, TIMING ON) SELECT count(*) FROM lotsarows;
> Time: 1477.672 ms (best of 3)
> pg_test_timing / Average loop time including overhead: 11.79 ns
>
> v7:
>
> EXPLAIN (ANALYZE, TIMING ON) SELECT count(*) FROM lotsarows;
> Time: 1476.326 ms (best of 3)
> pg_test_timing / Average loop time including overhead: 11.78 ns
>
> Thanks,
> Lukas
>
> [0]: https://godbolt.org/z/EvK1M66n5
>
> --
> Lukas Fittl
The code wasn't compiling properly on Windows because __x86_64__ is not
defined in Visual C++. I've changed the code to use
#if defined(__x86_64__) || defined(_M_X64)
I've also changed #include <x86intrin.h> to <immintrin.h>.
I've tested v8 of the patch (= v7 plus aforementioned changes) on
Windows. I'm reporting the best of 3 runs.
lotsarows test with parallelism disabled:
master: 2781 ms
v7: 2776 ms (timing_clock_source = 'system')
v7: 2091 ms (timing_clock_source = 'tsc')
pg_test_timing:
master: 27.04 ns
v7: 16.59 ns (QueryxPerformanceCounter)
v7: 13.69 ns (RDTSCP)
v7: 9.42 ns (RDTSC)
v8 of the patch is attached to this mail.
--
David Geier
| Attachment | Content-Type | Size |
|---|---|---|
| v8-0004-pg_test_timing-Also-test-RDTSC-RDTSCP-timing-and-.patch | text/x-patch | 6.1 KB |
| v8-0003-Timing-Use-Time-Stamp-Counter-TSC-on-x86-64-for-f.patch | text/x-patch | 24.9 KB |
| v8-0002-Timing-Streamline-ticks-to-nanosecond-conversion-.patch | text/x-patch | 13.3 KB |
| v8-0001-Check-for-HAVE__CPUIDEX-and-HAVE__GET_CPUID_COUNT.patch | text/x-patch | 6.7 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Dmitry Dolgov | 2026-02-23 15:26:44 | Re: Add ssl_(supported|shared)_groups to sslinfo |
| Previous Message | Bertrand Drouvot | 2026-02-23 15:22:22 | Re: Check for memset_explicit() and explicit_memset() |