Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?

From: Lukas Fittl <lukas(at)fittl(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, David Geier <geidav(dot)pg(at)gmail(dot)com>, andrew(at)dunslane(dot)net, Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com>
Subject: Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
Date: 2026-04-10 07:12:00
Message-ID: CAP53PkwR8gEteMDTK0=hGx5YmLMUhW3aFXAergr_VWgmBFFBig@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 9, 2026 at 9:02 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> On 2026-04-08 21:36:48 -0700, Lukas Fittl wrote:
> >
> > And that is indeed the problem. Looks like we can't trust CPUID
> > 0x15/0x16 when we're under a Hypervisor and its not KVM or VMware.
>
> Why you'd report a TSC frequency but populate it with a distinct frequency
> from the actual tsc is beyond me, but oh well, we gotta deal.
>
> Pushed the fix.

Thanks!

> > > What do you think about making pg_test_timing warn and return 1 if there is a
> > > tsc clocksource but the calibrated frequency differs by more than, idk, 10%?
> > > I'm worried that there might be other problems like this lurking and we
> > > wouldn't know about them unless the issue is of a similar magnitude.
> >
> > Yeah, that seems like a good idea. If I understand correctly you're
> > thinking we could tell the user to switch to
> > timing_clock_source=system in that case? (i.e. this is only a
> > pg_test_timing notice, not something "smarter" in the backend itself)
>
> I'd even just say "investigate your system an/or report a bug to postgres" :)
>

Sure, seems reasonable. I went ahead and added that in the attached
v27 (squashed with your other change).

Example how that looks like (tested without the fix in place):

---

TSC frequency source: x86, hypervisor, cpuid 0x15
TSC frequency in use: 7 kHz
TSC frequency from calibration: 2500260 kHz
WARNING: Calibrated TSC frequency differs by 35717900.0% from the TSC
frequency in use
HINT: Consider setting timing_clock_source to 'system'. Report bugs to
<pgsql-bugs(at)lists(dot)postgresql(dot)org>.

TSC clock source will be used by default, unless timing_clock_source
is set to 'system'.

---

I also added the extra newline before the "will be used by default"
message, because I felt its too much information bunched together
otherwise.

> > Attached 0001 fixes the issue for me on my test instance, and
> > presumably will fix drongo as well.
> >
> > 0002 is the updated version of emitting the additional debug info. I
> > think this is certainly less critical to have in 19 now, but could
> > still be useful if there are any future oddities.
>
> I think we should do something, probably together with the test enhancement I
> described, because otherwise we won't actually find potential breakage before
> it hits production environments.

Ack, makes sense to me.

> Any reason you didn't include the hypervisor like in the prior version? Just
> simplicity?
>
> I think this actually ends up getting overwritten if
> x86_hypervisor_tsc_frequency_khz() then "fails" to detect a frequency. Feels
> like it'd be good to continue reporting that it's in a hypervisor, because
> hypervisors can set tsc frequency multipliers and stuff.
>

Agreed that seems reasonable.

>
> What do you think about the attached incremental patch?
>
> If I e.g. intentionally force the hypervisor path being taken, on a non-VM, I
> get:
> TSC frequency source: x86, hypervisor, cpuid 0x40000010, calibration
> TSC frequency in use: 2497902 kHz
> TSC frequency from calibration: 2497902 kHz
> TSC clock source will be used by default, unless timing_clock_source is set to 'system'.
>
> And if rdtscp is not available:
> TSC frequency source: x86, no rdtscp
> TSC frequency in use: 0 kHz
> TSC frequency from calibration: 2500040 kHz
> TSC clock source is not usable. Likely unable to determine TSC frequency. Are you running in an unsupported virtualized environment?
>
> It's not perfect, but seems like it might be good enough?

Yeah, I think that looks good. On an m4.xlarge instance (Linux / xen)
with its very slow clock I get the following:

---

System clock source: clock_gettime (CLOCK_MONOTONIC)
Average loop time including overhead: 570.09 ns
Histogram of timing durations:
...
TSC frequency source: x86, not invariant
TSC frequency in use: 0 kHz
TSC frequency from calibration: 2299714 kHz

TSC clock source is not usable. Likely unable to determine TSC
frequency. Are you running in an unsupported virtualized environment?

---

FWIW, Linux has current_clocksource "xen" instead of "tsc" on that instance.

I assume we're okay with not reporting "hypervisor" in the source
string in the early failure case? If we wanted to, it'd make the diff
a bit larger since we'd need an extra hypervisor feature check.

> Note to future self: Need to consider update the sgml docs example. Probably
> just fudge it, to avoid having to update the numbers too.

Yeah, I wouldn't update the numbers in the docs. I've added an example
of the new output in the attached.

Thanks,
Lukas

--
Lukas Fittl

Attachment Content-Type Size
v27-0001-pg_test_timing-Show-additional-TSC-clock-source-.patch application/octet-stream 11.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2026-04-10 07:13:15 EXCEPT TABLE - Case inconsistency for describe \d and \dRp+
Previous Message Daniil Davydov 2026-04-10 07:10:02 Re: Fix bug with accessing to temporary tables of other sessions