Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?

From: Lukas Fittl <lukas(at)fittl(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: John Naylor <johncnaylorls(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Hannu Krosing <hannuk(at)google(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, David Geier <geidav(dot)pg(at)gmail(dot)com>
Subject: Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
Date: 2026-04-07 07:04:51
Message-ID: CAP53PkxXprjgJ6em41+m7qM4=Egmxy=NE8A2VT6WdaXbY6gffQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 6, 2026 at 9:55 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > FWIW, I don't think having PG_TSC_KNOWN_RELIABLE makes sense in any
> > case, because that would tie together x86_tsc_frequency_khz and
> > set_x86_features, i.e. you'd either have the frequency return function
> > modify X86Features later, or always run x86_tsc_frequency_khz when
> > setting features (and that'd then require you to put the frequency
> > value somewhere, etc.)
>
> I was thinking the latter.

Just for archives sake, my earlier note here had a typo - I was
referencing the proposed PG_TSC_FREQUENCY_KNOWN here, not
PG_TSC_KNOWN_RELIABLE. But I think you got what I meant.

I'd suggest we leave this be for now (i.e. keep the X86Features more
closely tied to CPUID bits), as that could always be refactored later.

>
> > I've gone ahead and rewritten that whole paragraph for clarity, and
> > also split it into two. Feedback welcome:
> >
> > <para>
> > If enabled, the TSC clock source will use specialized CPU instructions
> > when measuring time intervals. This lowers timing overhead compared to
> > reading the OS system clock, and reduces the measurement error on top
> > of the actual runtime, for example with EXPLAIN ANALYZE.
> > </para>
>
>
> > <para>
> > On x86-64 CPUs the TSC clock source utilizes the Time-Stamp Counter (TSC)
>
> It's a bit weird that the third use of TSC in these paragraphs introduces
> Time-Stamp Counter. I can see how you get there, but ...

Mhm. I had the same feeling when writing it that the ordering was a bit off.

I'll adjust for now to "TSC clock source, named after the Time-Stamp
Counter on x86-64" in the earlier sentence, and drop it in the later
one.

> Now I wonder if we should rename 'tsc' to 'cpu'...

Yeah, I had proposed making it more generic in an earlier email, but I
don't think there are great names available (and nobody jumped at the
suggestion). I think "cpu" is a bit too unspecific, vs "tsc" is more
clearly referencing the kind of instruction being used directly, and
is unique enough that it makes people read on vs jumping to a
conclusion. I was previously thinking of something like "hwtimer" or
"hwclock", but I don't think those are great.

I think the worst case here is that we need to add a "Note this is
named after the TSC instructions on x86-64, but utilizes the similarly
functioning cntvct_el0 instruction on ARM." or something to the
documentation if/when we expand support to ARM.

>
>
> > of the CPU. The RDTSC instruction is used to read the TSC for EXPLAIN ANALYZE.
> > For timings that require higher precision the RDTSCP instruction is used,
> > which avoids inaccuracies due to CPU instruction re-ordering. Use of
> > RDTSC/RDTSCP is not supported on older x86-64 CPUs or hypervisors that don't
> > pass the TSC frequency to guest VMs, and is not advised on systems that
>
> s/guest VMs/virtual machines/?

Yup.

>
> > utilize an emulated TSC. The TSC clock source is currently not supported on
> > other architectures.
>
> The not support bit about hypervisors isn't quite right though? We do even use
> it automatically if TSC_ADJUST is set (and the calibration loop succeeds).

Good catch, that's no longer true with the calibration in the picture - removed.

>
>
> > </para>
> > <para>
> > To help decide which clock source to use you can run the
> > <application>pg_test_timing</application>
> > utility to check TSC availability, and perform timing measurements.
> > </para>
>
> How about a link to to the pg_test_timing page? Hm, I guess that should also
> be updated with new output.

Right. I've added this in the 0004 commit. I also copied out the
pg_test_timings from a FreeBSD CI run and put them in as the example
output.

>
> I'd also sprinkle a few <acronym> and <command>s around.

Ack.

>
> Wonder if it's worth adding something like
> <indexterm><primary><acronym>RDTSC</acronym></primary></indexterm>
> <indexterm>
> <primary>Time-Stamp Counter</primary>
> <see><acronym>TSC</acronym></see>
> </indexterm>
> <indexterm><primary><acronym>TSC</acronym></primary></indexterm>
>
> otherwise somebody seeing one of these in logs, pg_test_timing output or
> whatever has even less of a chance to figure it out within our docs. They're
> not hard to search for terms exactly, so ...

Sure, that makes sense. If I followed the idea correctly, you just
wanted those added next to the timing_clock_source indexterm
definition, so I added it there.

>
> > I've also marked pg_get_ticks(_fast) as pg_attribute_always_inline,
> > per an off-list comment from Andres that he observed GCC not fully
> > inlining that function in pg_test_timing, presumably due to the
> > likely(..) in it.
>
> It's not the likely, I reproduced it even without that. I mouthed off about
> compilers on mastodon and was kindly asked to just open a bug report :)
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124795

:-)

Attached v22 with documentation updates. I've also marked the ARM
patch "nocfbot" for now, so its clear we're doing that one later.

Thanks,
Lukas

--
Lukas Fittl

Attachment Content-Type Size
v22-0001-instrumentation-Standardize-ticks-to-nanosecond-.patch application/x-patch 16.8 KB
v22-0002-Allow-retrieving-x86-TSC-frequency-flags-from-CP.patch application/x-patch 7.3 KB
v22-0003-instrumentation-Use-Time-Stamp-Counter-TSC-on-x8.patch application/x-patch 32.3 KB
nocfbot-v22-0005-instrumentation-ARM-support-for-fast-time-measur.patch application/x-patch 8.1 KB
v22-0004-pg_test_timing-Also-test-RDTSC-RDTSCP-timing-and.patch application/x-patch 15.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2026-04-07 07:28:56 Re: test_autovacuum/001_parallel_autovacuum is broken
Previous Message Peter Smith 2026-04-07 07:02:23 Logical Replication - revisit `is_table_publication` function implementation