Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?

From: Lukas Fittl <lukas(at)fittl(dot)com>
To: Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, John Naylor <johncnaylorls(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Hannu Krosing <hannuk(at)google(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, David Geier <geidav(dot)pg(at)gmail(dot)com>
Subject: Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
Date: 2026-04-07 08:13:17
Message-ID: CAP53PkyOJ0aGDBRTmg9Gi8ZOoR25BKjBW+WD5-vy8wfi+8pCsg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 7, 2026 at 12:32 AM Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com> wrote:
>
> > Its intentionally uint64, per this comment above it:
> >
> > * Note we utilize unsigned integers even though ticks are stored as a signed
> > * value to encourage compilers to generate better assembly, since we can be
> > * sure these values are not negative.
> >
> > In my earlier Compiler Explorer tests that did actually make a
> > difference for the generated assembly.
>
> Isn't that comment more about ticks_per_ns_scaled?
>
> For max_ticks_no_overflow the only use is with a cast to int64, so I
> didn't expect much assembly difference. Now I actually checked
> locally/godbolt, and I don't see any actual differences. Making
> max_ticks_no_overflow int64 and removing that cast generates exactly
> the same code.
>
> For ticks_per_ns_scaled, gcc 9-10 actually generates +1 mov
> instruction with int64, but that's not present in more recent
> versions.
>
> Recent compiler versions only have an idiv/div and shr/sar difference.
> Idiv is slower than div on intel, so that is a point for keeping
> ticks_per_ns_scaled unsigned.
>
> For arm I see the same lsr/asr and udiv/sdiv difference.
>
> https://godbolt.org/z/4r5GTbrs3
>
> (the main gcc vs clang difference seems to be clang's 32 bit division
> optimization)

Thanks for re-checking, and I think you're correct in your assessment
that max_ticks_no_overflow could be signed. But I also don't think it
does any harm for it to be unsigned, since we know it will never be
negative, and we're correctly using PG_INT64_MAX when initializing it
(i.e. we use the max that's valid for ticks, which is int64).

I don't feel strongly about this. I'll let Andres make the call
whether its worth changing.

Thanks,
Lukas

--
Lukas Fittl

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2026-04-07 08:19:03 Re: Improve monitoring of shared memory allocations
Previous Message Lukas Fittl 2026-04-07 08:05:29 Re: EXPLAIN: showing ReadStream / prefetch stats