| From: | Lukas Fittl <lukas(at)fittl(dot)com> |
|---|---|
| To: | David Geier <geidav(dot)pg(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de> |
| Cc: | Hannu Krosing <hannuk(at)google(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc? |
| Date: | 2026-01-31 20:11:33 |
| Message-ID: | CAP53PkyooCeR8YV0BUD_xC7oTZESHz8OdA=tP7pBRHFVQ9xtKg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Sun, Jan 11, 2026 at 11:26 AM David Geier <geidav(dot)pg(at)gmail(dot)com> wrote:
>
> > Based on Robert's suggestion I wanted to add a "fast_clock_source" enum
> > GUC which can have the following values "auto", "rdtsc", "try_rdtsc" and
> > "off". With that, at least no additional checks are needed and
> > performance will remain as previously benchmarked in this thread.
>
> The attached patch set is rebased on latest master and contains a commit
> which adds a "fast_clock_source" GUC that can be "try", "off" and
> "rdtsc" on Linux.
>
> Alternatively, we could call the GUC "clock_source" with "auto",
> "clock_gettime" and "rdtsc". Opinions?
No strong opinion on the GUC name ("fast_clock_source" seems fine?),
but I think "try" is a bit confusing if our logic is more than just
checking if the RDTSC(P) instruction is available, so I'd be in favor
of "auto" as the default value.
> I moved the call to INSTR_TIME_INITIALIZE() from InitPostgres() to
> PostmasterMain(). In InitPostgres() it kept the database in a recovery
> cycle.
I think we can actually avoid having anything in PostmasterMain (or
InitPostgres), and instead rely on the GUC assign mechanism.
I've reworked the patch a bit more, see attached v4, with a couple of
noticeable changes:
In regards to the GUC:
- Use the GUC check mechanism to complain if RDTSC clock source is
requested, but its not available
- Use the GUC assign mechanism to set whether we're actually using the
RDTSC clock source
- "auto" now means that we use RDTSC clock source by default if we're
on Linux x86, and the system clocksource is "tsc"
- "rdtsc" now allows using RDTSC on any x86-based Unix-like systems (I
see no reason to restrict the BSDs from using RDTSC when setting it
explicitly)
- Allow changing the clock source GUC at any time, without requiring a
restart (it makes testing much easier, and I don't see a good reason
to require a restart, or even restrict this to superuser?)
- Have pg_test_timing emit whether a fast clock source will be used by
default (or whether one needs to change the GUC)
Additionally:
- If a client program wants to use the fast clock source (like
pg_test_timing does), it first needs to call
pg_initialize_fast_clock_source() -- this replaces the
INSTR_TIME_INITIALIZE calls.
- I've re-introduced a patch (0001) to set HAVE__CPUIDEX on modern
GCC/clang. That's necessary to make this work on VM Hypervisors (per
the patch's commit message)
- I've merged the GUC patch together with the patch that adds the
RDTSC implementation (0002), I don't think that makes sense to review
or commit separately.
- I've unified the RDTSC and RDTSCP handling, so we require both in
order to use TSC as a time source. Because we have the shared
pg_ticks_to_ns() function that gets used on an instr_time regardless
of fast vs "slow" timing, and the variables used in that function are
affected by the RDTSC availability, we must use TSC consistently - I
don't think we can mix RDTSC for fast and pg_clock_gettime() for slow,
as this patch series has done so far.
Open questions for me:
- I'm seeing a CI test failure for "Linux - Debian Trixie - Meson"
(times out), but its not clear if this is a fluke - I'll check if this
recurs on the commitfest patch
- We're doing a lot of work in pg_ticks_to_ns, even when we're not
using RDTSC - and I think that shows in a slightly slower
pg_test_timing measurement compared to master when fast clock source
is off. Can we somehow only do that when we use RDTSC?
Here is a fresh test run with this patch on an AWS c6i.xlarge, i.e.
Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz / "Ice Lake":
CREATE TABLE test (id int);
INSERT INTO test SELECT * FROM generate_series(0, 1000000);
postgres=# SET fast_clock_source = off;
SET
Time: 0.107 ms
postgres=# EXPLAIN ANALYZE SELECT COUNT(*) FROM test;
QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=10633.55..10633.56 rows=1 width=8) (actual
time=44.117..44.811 rows=1.00 loops=1)
Buffers: shared hit=846 read=3579
-> Gather (cost=10633.34..10633.55 rows=2 width=8) (actual
time=44.060..44.804 rows=3.00 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=846 read=3579
-> Partial Aggregate (cost=9633.34..9633.35 rows=1 width=8)
(actual time=42.129..42.130 rows=1.00 loops=3)
Buffers: shared hit=846 read=3579
-> Parallel Seq Scan on test (cost=0.00..8591.67
rows=416667 width=0) (actual time=0.086..21.595 rows=333333.67
loops=3)
Buffers: shared hit=846 read=3579
Planning Time: 0.043 ms
Execution Time: 44.836 ms
(12 rows)
Time: 45.076 ms
postgres=# SET fast_clock_source = rdtsc;
SET
Time: 0.123 ms
postgres=# EXPLAIN ANALYZE SELECT COUNT(*) FROM test;
QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=10633.55..10633.56 rows=1 width=8) (actual
time=32.943..33.912 rows=1.00 loops=1)
Buffers: shared hit=1128 read=3297
-> Gather (cost=10633.34..10633.55 rows=2 width=8) (actual
time=32.868..33.906 rows=3.00 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=1128 read=3297
-> Partial Aggregate (cost=9633.34..9633.35 rows=1 width=8)
(actual time=30.705..30.706 rows=1.00 loops=3)
Buffers: shared hit=1128 read=3297
-> Parallel Seq Scan on test (cost=0.00..8591.67
rows=416667 width=0) (actual time=0.080..15.223 rows=333333.67
loops=3)
Buffers: shared hit=1128 read=3297
Planning Time: 0.042 ms
Execution Time: 33.935 ms
(12 rows)
Time: 34.180 ms
postgres=# EXPLAIN (ANALYZE, TIMING OFF) SELECT COUNT(*) FROM test;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=10633.55..10633.56 rows=1 width=8) (actual
rows=1.00 loops=1)
Buffers: shared hit=1410 read=3015
-> Gather (cost=10633.34..10633.55 rows=2 width=8) (actual
rows=3.00 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=1410 read=3015
-> Partial Aggregate (cost=9633.34..9633.35 rows=1 width=8)
(actual rows=1.00 loops=3)
Buffers: shared hit=1410 read=3015
-> Parallel Seq Scan on test (cost=0.00..8591.67
rows=416667 width=0) (actual rows=333333.67 loops=3)
Buffers: shared hit=1410 read=3015
Planning Time: 0.042 ms
Execution Time: 27.876 ms
(12 rows)
Time: 28.135 ms
Thanks,
Lukas
--
Lukas Fittl
| Attachment | Content-Type | Size |
|---|---|---|
| v4-0002-Use-time-stamp-counter-to-measure-time-on-Linux-x.patch | application/octet-stream | 20.6 KB |
| v4-0003-pg_test_timing-Also-test-fast-timing-and-report-t.patch | application/octet-stream | 8.2 KB |
| v4-0001-Check-for-HAVE__CPUIDEX-and-HAVE__GET_CPUID_COUNT.patch | application/octet-stream | 6.6 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2026-01-31 20:58:34 | Re: slow SELECT expr INTO var in plpgsql |
| Previous Message | Nikolay Samokhvalov | 2026-01-31 19:51:39 | Re: IO wait events for COPY FROM/TO PROGRAM or file |