Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?

From: Andres Freund <andres(at)anarazel(dot)de>
To: Lukas Fittl <lukas(at)fittl(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, David Geier <geidav(dot)pg(at)gmail(dot)com>, andrew(at)dunslane(dot)net
Subject: Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
Date: 2026-04-08 15:13:14
Message-ID: jr4hk2sxhqcfpb67ftz5g4vw33nm67cgf7go3wwmqsafu5aclq@5m67ukuhyszz
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Andrew, what's drongo running on? I assume it's some kind of VM? What
virtualization technology?

On 2026-04-07 23:33:52 -0700, Lukas Fittl wrote:
> On Tue, Apr 7, 2026 at 11:09 PM Lukas Fittl <lukas(at)fittl(dot)com> wrote:
> > It looks like this recent failure on buildfarm member drongo might be
> > related to the timing changes - mainly suspecting it because the
> > commits are in the (slightly larger) set that changed, and the error
> > is in a timing related module that uses INSTR_TIME_SET_CURRENT.
> >
> > From https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2026-04-08%2001%3A57%3A00
> >
> > # diff --strip-trailing-cr -U3
> > C:/prog/bf/root/HEAD/pgsql/contrib/tsm_system_time/expected/tsm_system_time.out
> > C:/prog/bf/root/HEAD/pgsql.build/testrun/tsm_system_time/regress/results/tsm_system_time.out
> > # --- C:/prog/bf/root/HEAD/pgsql/contrib/tsm_system_time/expected/tsm_system_time.out
> > 2023-01-23 04:39:00.533642000 +0000
> > # +++ C:/prog/bf/root/HEAD/pgsql.build/testrun/tsm_system_time/regress/results/tsm_system_time.out
> > 2026-04-08 04:03:15.248127800 +0000
> > # @@ -15,7 +15,7 @@
> > # SELECT count(*) FROM test_tablesample TABLESAMPLE system_time (100000);
> > # count
> > # -------
> > # - 31
> > # + 16
> > # (1 row)
> > #
> > # -- bad parameters should get through planning, but not execution:
> > # 1 of 1 tests failed.
> > # The differences that caused some tests to fail can be viewed in the
> > file "C:/prog/bf/root/HEAD/pgsql.build/testrun/tsm_system_time/regress/regression.diffs".
> > # A copy of the test summary that you see above is saved in the file
> > "C:/prog/bf/root/HEAD/pgsql.build/testrun/tsm_system_time/regress/regression.out".
> >
> > Haven't had a chance to dig through it yet, just noting it to start.
> >
>
> I wonder a bit if the problem here could be that
> INSTR_TIME_GET_MILLISEC got slightly more computationally expensive
> with 0022622c93d9 (due to the logic in pg_ticks_to_ns), and that
> module effectively does that in a tight loop. And if I understood
> drongo's configuration correctly, it runs under valgrind.

It doesn't, that's just options for valgrind if it were enabled, that are in
the default config.

I think there must be something different going on.

SELECT count(*) FROM test_tablesample TABLESAMPLE system_time (100000);

> This table sampling method accepts a single floating-point argument that is
> the maximum number of milliseconds to spend reading the table. This gives
> you direct control over how long the query takes, at the price that the size
> of the sample becomes hard to predict. The resulting sample will contain as
> many rows as could be read in the specified time, unless the whole table has
> been read first.

So that's waiting for 100 seconds.

But the whole test only took 18.88s. So something else than overhead is going
wrong here.

Oh.

I think it has a tsc clock source returning bogus results. Look at the
pg_test_timing output.

# System clock source: QueryPerformanceCounter
# Average loop time including overhead: 34.54 ns
...

# Clock source: RDTSCP
# Average loop time including overhead: 8179723.50 ns
...
# Fast clock source: RDTSC
# Average loop time including overhead: 4196799.05 ns
...

# TSC frequency in use: 7 kHz
# TSC frequency from calibration: 2500044 kHz
# TSC clock source will be used by default, unless timing_clock_source is set to 'system'.

Sooo, this system claims to have an invariant tsc but the frequency
we are getting from cpuid is completely out of whack.

Of course that could be for different reasons. It could be that we have a
portability issue around cpuids; we could calculate the frequency incorrectly;
the virtualization technology used might have configured wrong results...

I think we might need some sanity checking of the timing results in
pg_test_timing, so that we can pick up this kind of craziness directly in
the tests for pg_test_timing, rather than indirectly like here.

We probably should do some basic range checking in the cpuid based frequency
too, clearly 7khz can never be sane.

But I don't want to add that before we have figured out why we're seeing the
frequency, if it's e.g. that something in the cpuid infrastructure (cpuidex
not working right), or is the vmware logic wrong, ...

Maybe we should add a char **source_details argument to
pg_tsc_calibrate_frequency that pg_test_timing can report?

I wonder if we also should add a pg_timing_clock_source_info() function that
returns frequency_khz, calibrated_frequency_khz, frequency_source_info or
such?

> Attached a quick idea how we could rework that to avoid it.
>
> Thoughts?

Maybe maybe it's worth doing that for 20, but I don't think it's related to
the problem at hand.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2026-04-08 15:14:35 Re: Enable -Wstrict-prototypes and -Wold-style-definition by default
Previous Message Ashutosh Sharma 2026-04-08 14:55:00 Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication