Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?

From: Andres Freund <andres(at)anarazel(dot)de>
To: Lukas Fittl <lukas(at)fittl(dot)com>
Cc: John Naylor <johncnaylorls(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Hannu Krosing <hannuk(at)google(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, David Geier <geidav(dot)pg(at)gmail(dot)com>
Subject: Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
Date: 2026-03-08 16:39:47
Message-ID: opaq3twixq6uubmgclesklstm4cpe2mtmuwgm4pvsgoo33rep7@c3uph7pnlw4p
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2026-03-06 11:47:10 -0800, Lukas Fittl wrote:
> > But maybe we should just do the stupid thing and figure out the multiplier as
> > such:
> >
> > ns_to_cycles = tsc_via_rdtsc / to_ns(clock_gettime(CLOCK_BOOTTIME))
> >
> > in some quick experiments that ends up with a very good estimate. There would
> > have to be an awful long gap between the rdtsc and clock_gettime() computation
> > for the frequency to be meaningfully inaccurate.
>
> I think as long as the TSC counter and the clock boottime start at the
> same moment, that should work. But I'm not sure if we can rely on that
> to be the case in virtualized environments? I can do some more
> testing.

I did some testing, and unfortunately it's not good enough. There are several
issues:

- The tsc counter starts earlier than the OS, by enough to make counter
initially not quite right. It's not that bad on a laptop with a quick boot
time, but on a server with slower bios time initialization (e.g. due to
training of more memory) it's worse.

- If the server is rebooted not through a hard reset (the typical default),
but through something like kexec (which does not go through bios again), the
tsc counter is not reset.

> Alternatively, we could consider doing it like the Kernel does it for
> its calibration loop, and wait 1 second of wall time, and then see how
> far the TSC counter has advanced.

Yea, I think we need a calibration loop, unfortunately. But I think it should
be doable to make it a lot quicker than waiting one second. I'm thinking of
something like a loop that measures the the clock cycles and relative time
(using clock_gettime()) since the start and does so until the frequency
estimate predicts the time results closely. I think should be a few 10s of
milliseconds at most.

> FWIW, I ended up getting an x86 machine to be able to test these
> things better, and got myself an AMD CPU.

Dedication...

> Well, turns out that my
> non-virtualized AMD CPU ("AMD Ryzen™ AI Max+ 395") does not provide
> the TSC frequency via CPUID, at all :(

I can repro that on a somewhat older Zen 4 (7840U) laptop CPU.

> Instead on newer AMD CPUs you can use an MSR to get the TSC frequency,
> see [2]

:(

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2026-03-08 17:09:32 Re: Addressing buffer private reference count scalability issue
Previous Message jian he 2026-03-08 16:16:08 Re: Emitting JSON to file using COPY TO