Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?

From: David Geier <geidav(dot)pg(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>
Cc: Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Lukas Fittl <lukas(at)fittl(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
Date: 2023-01-02 13:28:20
Message-ID: 3eaeaa3a-b78e-ef0d-7319-5d713bbc09a0@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I re-based again on master and applied the following changes:

I removed the fallback for obtaining the TSC frequency from /proc/cpu as
suggested by Andres. Worst-case we fall back to clock_gettime().

I added code to obtain the TSC frequency via CPUID when under a
hypervisor. I had to use __cpuid() directly instead of __get_cpuid(),
because __get_cpuid() returns an error if the leaf is > 0x80000000
(probably the implementation pre-dates the hypervisor timing leafs).
Unfortunately, while testing my implementation under VMWare, I found
that RDTSC runs awfully slow there (like 30x slower). [1] indicates that
we cannot generally rely on RDTSC being actually fast on VMs. However,
the same applies to clock_gettime(). It runs as slow as RDTSC on my
VMWare setup. Hence, using RDTSC is not at disadvantage. I'm not
entirely sure if there aren't cases where e.g. clock_gettime() is
actually faster than RDTSC and it would be advantageous to use
clock_gettime(). We could add a GUC so that the user can decide which
clock source to use. Any thoughts?

I also somewhat improved the accuracy of the cycles to milli- and
microseconds conversion functions by having two more multipliers with
higher precision. For microseconds we could also keep the computation
integer-only. I'm wondering what to best do for seconds and
milliseconds. I'm currently leaning towards just keeping it as is,
because the durations measured and converted are usually long enough
that precision shouldn't be a problem.

In vacuum_lazy.c we do if ((INSTR_TIME_GET_MICROSEC(elapsed) / 1000). I
changed that to use INSTR_TIME_GET_MILLISECS() instead. Additionally, I
initialized a few variables of type instr_time which otherwise resulted
in warnings due to use of potentially uninitialized variables.

I also couldn't reproduce the reported timing discrepancy. For me the
runtime reported by \timing is just slightly higher than the time
reported by EXPLAIN ANALYZE, which is expected.

Beyond that:

What about renaming INSTR_TIME_GET_DOUBLE() to INSTR_TIME_GET_SECS() so
that it's consistent with the _MILLISEC() and _MICROSEC() variants?

The INSTR_TIME_GET_MICROSEC() returns a uint64 while the other variants
return double. This seems error prone. What about renaming the function
or also have the function return a double and cast where necessary at
the call site?

If no one objects I would also re-register this patch in the commit fest.

[1]
https://vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/Timekeeping-In-VirtualMachines.pdf
(page 11 "Virtual TSC")

--
David Geier
(ServiceNow)

Attachment Content-Type Size
0001-Change-instr_time-to-just-store-nanoseconds-v4.patch text/x-patch 2.8 KB
0002-Use-CPU-reference-cycles-via-RDTSC-v4.patch text/x-patch 11.0 KB
0003-Refactor-some-instr_time-related-code-v4.patch text/x-patch 2.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2023-01-02 14:45:27 Re: Add a test to ldapbindpasswd
Previous Message Amit Kapila 2023-01-02 12:49:39 Re: Data loss on logical replication, 12.12 to 14.5, ALTER SUBSCRIPTION