Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?

From: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
To: Lukas Fittl <lukas(at)fittl(dot)com>
Cc: David Geier <geidav(dot)pg(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Hannu Krosing <hannuk(at)google(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
Date: 2026-02-02 09:22:37
Message-ID: CAKZiRmzF50+drGgm6F-K1dQnuT=Khob0Q_dfZdv0-1iq4TVa4Q@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Lukas,

On Sun, Feb 1, 2026 at 4:16 AM Lukas Fittl <lukas(at)fittl(dot)com> wrote:
>
> On Sat, Jan 31, 2026 at 12:11 PM Lukas Fittl <lukas(at)fittl(dot)com> wrote:
> > I've reworked the patch a bit more, see attached v4
>
> And of course, I took the wrong branch when running "git format-patch"
> - apologies.
>
> See attached v5.

> +#define CPUID_HYPERVISOR_VMWARE(words) (words[1] == 0x61774d56 && words[2] == 0x4d566572 && words[3] == 0x65726177) /* VMwareVMware */
> +#define CPUID_HYPERVISOR_KVM(words) (words[1] == 0x4b4d564b && words[2] == 0x564b4d56 && words[3] == 0x0000004d) /* KVMKVMKVM */
> +
> +static bool
> +get_tsc_frequency_khz()
[..]
> + /*
> + * Check if we have a KVM or VMware Hypervisor passing down TSC frequency
> + * to us in a guest VM
> + *
> + * Note that accessing the 0x40000000 leaf for Hypervisor info requires
> + * use of __cpuidex to set ECX to 0. The similar __get_cpuid_count
> + * function does not work as expected since it contains a check for
> + * __get_cpuid_max, which has been observed to be lower than the special
> + * Hypervisor leaf.
> + */
> +#if defined(HAVE__CPUIDEX)
> + __cpuidex((int32 *) r, 0x40000000, 0);
> + if (r[0] >= 0x40000010 && (CPUID_HYPERVISOR_VMWARE(r) || CPUID_HYPERVISOR_KVM(r)))
> + {
> + __cpuidex((int32 *) r, 0x40000010, 0);
> + if (r[0] > 0)
> + {
> + tsc_freq = r[0];
> + return true;
> + }
> + }
> +#endif
> +
> + return false;
> +}

When trying to understand this code I was thinking how it could be
made smaller or less dependent on such low-level intrinsics, the only
thing that came to my mind was launching systemd-detect-virt(1) via
fork+execve, as after all we do have USE_SYSTEMD (for sd_notify(2) already
consumed in backend/postmaster/postmaster.c) anyway.

Sadly this path for checking VM-types seems like opening can of worms
- they evolved lots of code to cover various other products,
see e.g. in detect_vm() and that thing is not exported.

Another way would be probably inquiring their D-Bus API, something like
below command seems to work:
busctl get-property org.freedesktop.systemd1
/org/freedesktop/systemd1 org.freedesktop.systemd1.Manager
Virtualization

(that seems to be sd_bus_get_property_string(3)).

It's not that I'm recommending usage of any of those (which is linked
to us most of the time?) or fan of D-Bus (I'm not). I've just thought
it might be less code to use it for autodetection of VM type, but
apparently not (?) See their detect_vm_cpuid() with that vm_table[]
and memcmp() seems to be a more elegant way of writing this.

BTW, -1 to fast_clock_source, +1 to clock_source or maybe
explain_clock_source(?)

Also it would be cool if the patch would provide some way of reporting back
what clock_source was really used in case of FAST_CLOCK_SOURCE_AUTO.
Something like huge_pages_status or some elog(DEBUG).

-J.

[1] - https://github.com/systemd/systemd/blob/e831a44b07ebf48992967e366cfc1bcee2683f3d/src/detect-virt/detect-virt.c#L186
[2] - https://github.com/systemd/systemd/blob/e831a44b07ebf48992967e366cfc1bcee2683f3d/src/basic/virt.c#L450

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Geier 2026-02-02 09:29:16 Re: Hash-based MCV matching for large IN-lists
Previous Message Mihail Nikalayeu 2026-02-02 09:18:01 Re: Adding REPACK [concurrently]