Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?

From: Andres Freund <andres(at)anarazel(dot)de>
To: Lukas Fittl <lukas(at)fittl(dot)com>
Cc: John Naylor <johncnaylorls(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Hannu Krosing <hannuk(at)google(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, David Geier <geidav(dot)pg(at)gmail(dot)com>
Subject: Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
Date: 2026-03-06 19:22:27
Message-ID: caqgbsn6mbkmtqnencdaim7udutxrqoc6j6uraoof3ebovjycp@h2doabacklal
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2026-03-03 10:22:42 -0800, Lukas Fittl wrote:
> > But if we read files anyway, wouldn't just using
> > /sys/devices/system/cpu/cpu0/cpufreq/base_frequency
> > work?
>
> I tested this just now on an Azure VM (Standard D2s v3), and its
> close, but unfortunately CPU frequency doesn't match the TSC frequency
> (cpuinfo_max_freq is 2800000, scaling_cur_freq is 2496279, and TSC
> frequency via MSR is 2793438 -- note that I didn't have base_frequency
> on this VM). My understanding is that the TSC clock is virtualized in
> HyperV and does not directly match the CPU frequency.

:(

It seems quite ridiculous that there's no cpuid to get the frequency of both
virtualized and "real" tsc.

> I'm also happy to take this out again - maybe we can get the
> HyperV/Azure Linux folks to improve the Kernel side here to pass down
> the TSC frequency without needing the MSR, and just not support it for
> now.

Yea, this doesn't seem worth it, it won't get used this way, I think.

> An alternate idea could be to allow overriding the TSC frequency via a
> GUC - then one could use the root user (or a setuid program) to get
> the TSC frequency on Azure/HyperV via the MSR and pass it to Postgres
> at start. But not sure that's worth the trouble, since it won't help
> with environments that don't have a reliable TSC (e.g. Virtualbox, I
> think).

I don't think manually specifying it makes sense either.

But maybe we should just do the stupid thing and figure out the multiplier as
such:

ns_to_cycles = tsc_via_rdtsc / to_ns(clock_gettime(CLOCK_BOOTTIME))

in some quick experiments that ends up with a very good estimate. There would
have to be an awful long gap between the rdtsc and clock_gettime() computation
for the frequency to be meaningfully inaccurate.

I was worried for a moment that the there would be issues with the tsc counter
overflowing after a long uptime, but that doesn't seem a real issue if I did
the math right (at a 10GHz tsc freq the time to overflow would be ~58 years).

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2026-03-06 19:31:06 Re: Mis-use of type BlockNumber?
Previous Message Peter Eisentraut 2026-03-06 19:17:44 Re: Change copyObject() to use typeof_unqual