| From: | Lukas Fittl <lukas(at)fittl(dot)com> |
|---|---|
| To: | Andres Freund <andres(at)anarazel(dot)de> |
| Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, David Geier <geidav(dot)pg(at)gmail(dot)com>, andrew(at)dunslane(dot)net |
| Subject: | Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc? |
| Date: | 2026-04-08 19:25:54 |
| Message-ID: | CAP53PkyMmUO=QbZPSc+uqSi+2pVjuEZc4MA45nKtLiZtNYf5NQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Wed, Apr 8, 2026 at 8:13 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> So that's waiting for 100 seconds.
>
> But the whole test only took 18.88s. So something else than overhead is going
> wrong here.
>
> Oh.
>
> I think it has a tsc clock source returning bogus results. Look at the
> pg_test_timing output.
>
>
> # System clock source: QueryPerformanceCounter
> # Average loop time including overhead: 34.54 ns
> ...
>
> # Clock source: RDTSCP
> # Average loop time including overhead: 8179723.50 ns
> ...
> # Fast clock source: RDTSC
> # Average loop time including overhead: 4196799.05 ns
> ...
>
> # TSC frequency in use: 7 kHz
> # TSC frequency from calibration: 2500044 kHz
> # TSC clock source will be used by default, unless timing_clock_source is set to 'system'.
>
> Sooo, this system claims to have an invariant tsc but the frequency
> we are getting from cpuid is completely out of whack.
Huh. Yeah, I think this is a case of getting a bad TSC frequency from CPUID.
>
> Of course that could be for different reasons. It could be that we have a
> portability issue around cpuids; we could calculate the frequency incorrectly;
> the virtualization technology used might have configured wrong results...
>
>
> I think we might need some sanity checking of the timing results in
> pg_test_timing, so that we can pick up this kind of craziness directly in
> the tests for pg_test_timing, rather than indirectly like here.
>
>
> We probably should do some basic range checking in the cpuid based frequency
> too, clearly 7khz can never be sane.
>
> But I don't want to add that before we have figured out why we're seeing the
> frequency, if it's e.g. that something in the cpuid infrastructure (cpuidex
> not working right), or is the vmware logic wrong, ...
Agreed.
I half wonder if this could be a case of a Hypervisor, but not KVM or
VMware, and so we fall through to the regular CPUID information (which
AFAIR is different from how Linux itself handles that case, where
it'll always do calibration in such cases). I think the solution might
be to use the TSC calibration always on other hypervisors.
But lets wait for Andrew to confirm the configuration of the machine /
have a run with the additional information.
>
>
> Maybe we should add a char **source_details argument to
> pg_tsc_calibrate_frequency that pg_test_timing can report?
>
> I wonder if we also should add a pg_timing_clock_source_info() function that
> returns frequency_khz, calibrated_frequency_khz, frequency_source_info or
> such?
See attached a patch that adds that and shows its output in
pg_test_timing. Here is an example from an AWS instance:
TSC frequency in use: 2899943 kHz
TSC frequency source: x86, hypervisor (kvm), cpuid 0x40000010
TSC frequency from calibration: 2899063 kHz
TSC clock source will be used by default, unless timing_clock_source
is set to 'system'.
And from Azure (HyperV):
TSC frequency in use: 2791936 kHz
TSC frequency source: x86, calibration
TSC frequency from calibration: 2793379 kHz
TSC clock source will be used by default, unless timing_clock_source
is set to 'system'.
Note it doesn't emit the fact that its a hypervisor if calibration was
used to keep the code a bit simpler, but would emit it as "hypervisor
(other)" or "hypervisor (unknown)" (if cpuidex wasn't available) if
cpuid 0x15/0x16 get used.
>
> > Attached a quick idea how we could rework that to avoid it.
> >
> > Thoughts?
>
> Maybe maybe it's worth doing that for 20, but I don't think it's related to
> the problem at hand.
Ack, agreed this is unrelated to the issue.
Thanks,
Lukas
--
Lukas Fittl
| Attachment | Content-Type | Size |
|---|---|---|
| v24-0001-instrumentation-Show-additional-TSC-clock-source.patch | application/octet-stream | 9.5 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Sami Imseih | 2026-04-08 19:35:29 | Re: Add pg_stat_autovacuum_priority |
| Previous Message | Tom Lane | 2026-04-08 19:21:38 | Re: Add pg_stat_autovacuum_priority |