| From: | Andres Freund <andres(at)anarazel(dot)de> |
|---|---|
| To: | Lukas Fittl <lukas(at)fittl(dot)com> |
| Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, David Geier <geidav(dot)pg(at)gmail(dot)com>, andrew(at)dunslane(dot)net, Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com> |
| Subject: | Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc? |
| Date: | 2026-04-09 16:02:28 |
| Message-ID: | npplam5gf5c6yrsaqjfdkdhz57czqrmbqlo2d7yud7uj76iozz@vuxdfa5hujzv |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
On 2026-04-08 21:36:48 -0700, Lukas Fittl wrote:
> On Wed, Apr 8, 2026 at 12:44 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> >
> > > I half wonder if this could be a case of a Hypervisor, but not KVM or
> > > VMware, and so we fall through to the regular CPUID information (which
> > > AFAIR is different from how Linux itself handles that case, where
> > > it'll always do calibration in such cases). I think the solution might
> > > be to use the TSC calibration always on other hypervisors.
> >
> > Plausible.
> >
>
> And that is indeed the problem. Looks like we can't trust CPUID
> 0x15/0x16 when we're under a Hypervisor and its not KVM or VMware.
Why you'd report a TSC frequency but populate it with a distinct frequency
from the actual tsc is beyond me, but oh well, we gotta deal.
Pushed the fix.
> FWIW, earlier deprecated EC2 instance types that used Xen (e.g.
> m4.xlarge) just report "TSC unusable" on Windows, presumably because
> its not an invariant TSC, but I haven't dug into it since it seems
> fine to automatically use the system clock source in that case.
Yea, that's not worth investigating.
> This was without the early return, i.e. Virtualbox doesn't pass
> through CPUID even if the host has it on Intel CPUs.
Seems good.
> > What do you think about making pg_test_timing warn and return 1 if there is a
> > tsc clocksource but the calibrated frequency differs by more than, idk, 10%?
> > I'm worried that there might be other problems like this lurking and we
> > wouldn't know about them unless the issue is of a similar magnitude.
>
> Yeah, that seems like a good idea. If I understand correctly you're
> thinking we could tell the user to switch to
> timing_clock_source=system in that case? (i.e. this is only a
> pg_test_timing notice, not something "smarter" in the backend itself)
I'd even just say "investigate your system an/or report a bug to postgres" :)
> Attached 0001 fixes the issue for me on my test instance, and
> presumably will fix drongo as well.
>
> 0002 is the updated version of emitting the additional debug info. I
> think this is certainly less critical to have in 19 now, but could
> still be useful if there are any future oddities.
I think we should do something, probably together with the test enhancement I
described, because otherwise we won't actually find potential breakage before
it hits production environments.
> @@ -161,10 +165,13 @@ static uint32 x86_hypervisor_tsc_frequency_khz(void);
> * 0 indicates the frequency information was not accessible via CPUID.
> */
> uint32
> -x86_tsc_frequency_khz(void)
> +x86_tsc_frequency_khz(char *source, size_t source_len)
> {
> unsigned int reg[4] = {0};
>
> + if (source)
> + strlcpy(source, "x86", source_len);
> +
> /*
> * If we're inside a virtual machine, try to fetch the TSC frequency from
> * the Hypervisor itself using specialized CPUID registers.
> @@ -173,7 +180,11 @@ x86_tsc_frequency_khz(void)
> * a virtual machine, as it has been observed to be wildly incorrect.
> */
> if (x86_feature_available(PG_HYPERVISOR))
> + {
> + if (source)
> + strlcat(source, ", hypervisor, cpuid 0x40000010", source_len);
> return x86_hypervisor_tsc_frequency_khz();
> + }
>
> /*
> * On modern Intel CPUs, the TSC is implemented by invariant
> timekeeping
Any reason you didn't include the hypervisor like in the prior version? Just
simplicity?
I think this actually ends up getting overwritten if
x86_hypervisor_tsc_frequency_khz() then "fails" to detect a frequency. Feels
like it'd be good to continue reporting that it's in a hypervisor, because
hypervisors can set tsc frequency multipliers and stuff.
What do you think about the attached incremental patch?
If I e.g. intentionally force the hypervisor path being taken, on a non-VM, I
get:
TSC frequency source: x86, hypervisor, cpuid 0x40000010, calibration
TSC frequency in use: 2497902 kHz
TSC frequency from calibration: 2497902 kHz
TSC clock source will be used by default, unless timing_clock_source is set to 'system'.
And if rdtscp is not available:
TSC frequency source: x86, no rdtscp
TSC frequency in use: 0 kHz
TSC frequency from calibration: 2500040 kHz
TSC clock source is not usable. Likely unable to determine TSC frequency. Are you running in an unsupported virtualized environment?
It's not perfect, but seems like it might be good enough?
Note to future self: Need to consider update the sgml docs example. Probably
just fudge it, to avoid having to update the numbers too.
Greetings,
Andres Freund
| Attachment | Content-Type | Size |
|---|---|---|
| v26a-0001-instrumentation-Show-additional-TSC-clock-sourc.patch | text/x-diff | 8.6 KB |
| v26a-0002-fixup-instrumentation-Show-additional-TSC-clock.patch | text/x-diff | 4.7 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andres Freund | 2026-04-09 16:18:14 | Re: Implement waiting for wal lsn replay: reloaded |
| Previous Message | Sandro Santilli | 2026-04-09 16:01:58 | Re: Eliminating SPI / SQL from some RI triggers - take 3 |