Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?

From: Andres Freund <andres(at)anarazel(dot)de>
To: Lukas Fittl <lukas(at)fittl(dot)com>
Cc: John Naylor <johncnaylorls(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Hannu Krosing <hannuk(at)google(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, David Geier <geidav(dot)pg(at)gmail(dot)com>
Subject: Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
Date: 2026-04-07 04:55:45
Message-ID: znzsoci7b3crunqxf66xdylnllijbqij4vafos7yskmrbtqxhs@6hegy3qeu7e7
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2026-04-06 20:41:46 -0700, Lukas Fittl wrote:
> On Mon, Apr 6, 2026 at 5:40 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > I wonder if the cpuid tests should be a bit further abstracted into
> > pg_cpu_x86.c.
> >
> > E.g. instead of tsc_detect_frequency() checking for PG_RDTSCP,
> > PG_TSC_INVARIANT, PG_TSC_ADJUST we could have
> >
> > PG_TSC_AVAILABLE /* RDTSCP & INVARIANT */
> > PG_TSC_KNOWN_RELIABLE /* PG_TSC_AVAILABLE && PG_TSC_ADJUST */
> > PG_TSC_FREQUENCY_KNOWN /* x86_tsc_frequency_khz works */
> >
> > and always run all of that during set_x86_features().
>
> I think that could work, but I kept the flags in features closer to
> being direct mappings to CPUID bits since that seemed to be intent of
> how John designed the facility originally.
>
> John, do you have thoughts on this? (I've not changed it for now)

I'm ok either way.

> FWIW, I don't think having PG_TSC_KNOWN_RELIABLE makes sense in any
> case, because that would tie together x86_tsc_frequency_khz and
> set_x86_features, i.e. you'd either have the frequency return function
> modify X86Features later, or always run x86_tsc_frequency_khz when
> setting features (and that'd then require you to put the frequency
> value somewhere, etc.)

I was thinking the latter.

> I've gone ahead and rewritten that whole paragraph for clarity, and
> also split it into two. Feedback welcome:
>
> <para>
> If enabled, the TSC clock source will use specialized CPU instructions
> when measuring time intervals. This lowers timing overhead compared to
> reading the OS system clock, and reduces the measurement error on top
> of the actual runtime, for example with EXPLAIN ANALYZE.
> </para>

> <para>
> On x86-64 CPUs the TSC clock source utilizes the Time-Stamp Counter (TSC)

It's a bit weird that the third use of TSC in these paragraphs introduces
Time-Stamp Counter. I can see how you get there, but ...

Now I wonder if we should rename 'tsc' to 'cpu'...

> of the CPU. The RDTSC instruction is used to read the TSC for EXPLAIN ANALYZE.
> For timings that require higher precision the RDTSCP instruction is used,
> which avoids inaccuracies due to CPU instruction re-ordering. Use of
> RDTSC/RDTSCP is not supported on older x86-64 CPUs or hypervisors that don't
> pass the TSC frequency to guest VMs, and is not advised on systems that

s/guest VMs/virtual machines/?

> utilize an emulated TSC. The TSC clock source is currently not supported on
> other architectures.

The not support bit about hypervisors isn't quite right though? We do even use
it automatically if TSC_ADJUST is set (and the calibration loop succeeds).

> </para>
> <para>
> To help decide which clock source to use you can run the
> <application>pg_test_timing</application>
> utility to check TSC availability, and perform timing measurements.
> </para>

How about a link to to the pg_test_timing page? Hm, I guess that should also
be updated with new output.

I'd also sprinkle a few <acronym> and <command>s around.

Wonder if it's worth adding something like
<indexterm><primary><acronym>RDTSC</acronym></primary></indexterm>
<indexterm>
<primary>Time-Stamp Counter</primary>
<see><acronym>TSC</acronym></see>
</indexterm>
<indexterm><primary><acronym>TSC</acronym></primary></indexterm>

otherwise somebody seeing one of these in logs, pg_test_timing output or
whatever has even less of a chance to figure it out within our docs. They're
not hard to search for terms exactly, so ...

> I've also marked pg_get_ticks(_fast) as pg_attribute_always_inline,
> per an off-list comment from Andres that he observed GCC not fully
> inlining that function in pg_test_timing, presumably due to the
> likely(..) in it.

It's not the likely, I reproduced it even without that. I mouthed off about
compilers on mastodon and was kindly asked to just open a bug report :)

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124795

I think discussing which indexterms should be added signals that this is
pretty close.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2026-04-07 04:56:04 Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
Previous Message Tom Lane 2026-04-07 04:44:47 Re: Adding REPACK [concurrently]