Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?

From: David Geier <geidav(dot)pg(at)gmail(dot)com>
To: Lukas Fittl <lukas(at)fittl(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
Date: 2025-09-01 10:36:24
Message-ID: bf04c66b-f062-4a7a-bf2f-7c1d7e262ad6@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Lukas!

On 01.03.2025 08:45, Lukas Fittl wrote:
> On Sun, Jun 2, 2024 at 1:08 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
>> At some point this patch switched from rdtsc to rdtscp, which imo largely
>> negates the point of it. What lead to that?
>
>
> From what I can gather, it appears this was an oversight when David first
> reapplied the work on the instr_time changes that were committed.

Yes, that was by accident.

>
> I've come back to this and rebased this, as well as:

Thanks for moving this forward.

> - Added support for VMs running under KVM/VMware Hypervisors
>
> On that last item, this does indeed make a difference on VMs, contrary to
> the code comment in earlier versions (and I've not seen any odd behaviors
> again, FWIW):

How can we be sure we've actually covered all hypervisors? How much
coverage do we have in the build farm? Are we good if passes on all
build animals?

>
> On a c5.xlarge (Skylake-SP or Cascade Lake) on AWS, with the same test as
> done initially in this thread:
>
> SELECT COUNT(*) FROM lotsarows;
> Time: 974.423 ms
>
> EXPLAIN (ANALYZE, TIMING OFF) SELECT COUNT(*) FROM lotsarows;
> Time: 1336.196 ms (00:01.336)
>
> Without patch:
> EXPLAIN (ANALYZE) SELECT COUNT(*) FROM lotsarows;
> Time: 2165.069 ms (00:02.165)
>
> Per loop time including overhead: 22.15 ns
>
> With patch:
> EXPLAIN (ANALYZE, TIMING ON) SELECT COUNT(*) FROM lotsarows;
> Time: 1654.289 ms (00:01.654)
>
> Per loop time including overhead: 9.81 ns
>
> I'm registering this again in the current commitfest to help reviews.
>
> Open questions I have:
> - Could we rely on checking whether the TSC timesource is invariant (via
> CPUID), instead of relying on Linux choosing it as a clocksource?

Why do you want to do that? Are you concerned that Linux might pick a
different clock source even though invariant TSC is available?

We could code our own check but looking at the Linux kernel code, this
is a bit more involved if we want to do it completely right. They check
e.g. if the TSC is also synchronized across different CPUs, which is not
the case if they're on different chassis (see unsynchronized_tsc() ->
apic_is_clustered_box()).

I think it's safer to start with relying on the kernel. Some research
suggests that the TSC is the preferred clock source if available.

> - For the Hypervisor CPUID checks I had to rely on __cpuidex which is only
> available on newer GCC versions (
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95973) how do we best check
> for its presence? (compiler version, or rather configure check?) -- note
> this is also the reason the patch fails the clang compiler warning check in
> CI, despite clang having support in recent versions (
> https://reviews.llvm.org/D121653)

What about instead using #if !__has_builtin(_cpuidex) ... #endif to
define the built-in ourselves as a function in case it doesn't exist?

--
David Geier

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Yugo Nagata 2025-09-01 11:01:48 Re: PG 18 release notes draft committed
Previous Message Florents Tselai 2025-09-01 10:35:12 Re: split func.sgml to separated individual sgml files