Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?

From: David Geier <geidav(dot)pg(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>
Cc: Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Lukas Fittl <lukas(at)fittl(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
Date: 2022-11-19 20:05:28
Message-ID: d7f94549-9179-b63c-878a-002a7cb586a6@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I think it would be great to get this patch committed. Beyond the
reasons already mentioned, the significant overhead also tends to skew
the reported runtimes in ways that makes it difficult to compare them.
For example, if two nodes are executed equally often but one needs twice
the time to process the rows: in such a case EXPLAIN ANALYZE should
report timings that are 2x apart. However, currently, the high overhead
of clock_gettime() tends to skew the relative runtimes.

On 10/12/22 10:33, Michael Paquier wrote:
> No rebased version has been sent since this update, so this patch has
> been marked as RwF.

I've rebased the patch set on latest master and fixed a few compiler
warnings. Beyond that some findings and thoughts:

You're only using RDTSC if the clock source is 'tsc'. Great idea to not
bother caring about a lot of hairy TSC details. Looking at the kernel
code this seems to imply that the TSC is frequency invariant. I don't
think though that this implies that Linux is not running under a
hypervisor; which is good because I assume PostgreSQL is used a lot in
VMs. However, when running under a hypervisor (at least with VMWare)
CPUID leaf 0x16 is not available. In my tests __get_cpuid() indicated
success but the returned values were garbage. Instead of using leaf
0x16, we should then use the hypervisor interface to obtain the TSC
frequency. Checking if a hypervisor is active can be done via:

bool IsHypervisorActive()
{
    uint32 cpuinfo[4] = {0};
    int res = __get_cpuid(0x1, &cpuinfo[0], &cpuinfo[1], &cpuinfo[2],
&cpuinfo[3]);
    return res > 0 && (cpuinfo[2] & (1 << 30));
}

Obtaining the TSC frequency via the hypervisor interface can be done
with the following code. See https://lwn.net/Articles/301888/ for more
details.

// Under hypervisors (tested with VMWare) leaf 0x16 is not available,
even though __get_cpuid() succeeds.
// Hence, if running under a hypervisor, use the hypervisor interface to
obtain TSC frequency.
uint32 cpuinfo[4] = {0};
if (IsHypervisorActive() && __get_cpuid(0x40000001, &cpuinfo[0],
&cpuinfo[1], &cpuinfo[2], &cpuinfo[3]) > 0)
    cycles_to_sec = 1.0 / ((double)cpuinfo[0] * 1000 * 1000);

Given that we anyways switch between RDTSC and clock_gettime() with a
global variable, what about exposing the clock source as GUC? That way
the user can switch back to a working clock source in case we miss a
detail around activating or reading the TSC.

I'm happy to update the patches accordingly.

--
David Geier
(ServiceNow)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Geier 2022-11-19 20:06:23 Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
Previous Message David G. Johnston 2022-11-19 20:00:33 Re: How to *really* quit psql?