Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?

From: Lukas Fittl <lukas(at)fittl(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
Date: 2022-07-01 08:23:01
Message-ID: CAP53Pky+4MJfdC-R3HQFPRauy2N+5_7ExpEhCFUo8EU3bCPjEg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jun 12, 2020 at 4:28 PM Andres Freund <andres(at)anarazel(dot)de> wrote:

> The attached patches are really just a prototype. I'm also not really
> planning to work on getting this into a "production ready" patchset
> anytime soon. I developed it primarily because I found it the overhead
> made it too hard to nail down in which part of a query tree performance
> changed. If somebody else wants to continue from here...
>
> I do think it's be a pretty significant improvement if we could reduce
> the timing overhead of EXPLAIN ANALYZE by this much. Even if requires a
> bunch of low-level code.
>

Based on an off-list conversation with Andres, I decided to dust off this
old
patch for using rdtsc directly. The significant EXPLAIN ANALYZE performance
improvements (especially when using rdtsc instead of rdtsc*p*) seem to
warrant
giving this a more thorough look.

See attached an updated patch (adding it to the July commitfest), with a few
changes:

- Keep using clock_gettime() as a fallback if we decide to not use rdtsc
- Fallback to /proc/cpuinfo for clock frequency, if cpuid(0x16) doesn't work
- The decision to use rdtsc (or not) is made at runtime, in the new
INSTR_TIME_INITIALIZE() -- we can't make this decision at compile time
because this is dependent on the specific CPU in use, amongst other things
- In an abundance of caution, for now I've decided to only enable this if we
are on Linux/x86, and the current kernel clocksource is TSC (the kernel
has
quite sophisticated logic around making this decision, see [1])

Note that if we implemented the decision logic ourselves (instead of relying
on the Linux kernel), I'd be most worried about older virtualization
technology. In my understanding getting this right is notably more
complicated
than just checking cpuid, see [2].

Known WIP problems with this patch version:

* There appears to be a timing discrepancy I haven't yet worked out, where
the \timing data reported by psql doesn't match what EXPLAIN ANALYZE is
reporting. With Andres' earlier test case, I'm seeing a consistent ~700ms
higher for \timing than for the EXPLAIN ANALYZE time reported on the
server
side, only when rdtsc measurement is used -- its likely there is a problem
somewhere with how we perform the cycles to time conversion
* Possibly related, the floating point handling for the cycles_to_sec
variable
is problematic in terms of precision (see FIXME, taken over from Andres'
POC)

Open questions from me:

1) Do we need to account for different TSC offsets on different CPUs in SMP
systems? (the Linux kernel certainly has logic to that extent, but [3]
suggests this is no longer a problem on Nehalem and newer chips, i.e.
those
having an invariant TSC)

2) Should we have a setting "--with-tsc" for configure? (instead of always
enabling it when on Linux/x86 with a TSC clocksource)

3) Are there cases where we actually want to use rdtsc*p*? (i.e. wait for
current instructions to finish -- the prior discussion seemed to suggest
we don't want it for node instruction measurements, but possibly we do
want
this in other cases?)

4) Should we support using the "mrs" instruction on ARM? (which is similar
to
rdtsc, see [4])

Thanks,
Lukas

[1] https://github.com/torvalds/linux/blob/master/arch/x86/kernel/tsc.c
[2] http://oliveryang.net/2015/09/pitfalls-of-TSC-usage/
[3] https://stackoverflow.com/a/11060619/1652607
[4] https://cpufun.substack.com/p/fun-with-timers-and-cpuid

--
Lukas Fittl

Attachment Content-Type Size
v2-0002-WIP-Use-cpu-reference-cycles-via-rdtsc-to-measure.patch application/octet-stream 11.1 KB
v2-0001-WIP-Change-instr_time-to-just-store-nanoseconds-t.patch application/octet-stream 2.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hannu Krosing 2022-07-01 09:14:59 Re: Hardening PostgreSQL via (optional) ban on local file system access
Previous Message Jille Timmermans 2022-07-01 08:21:26 Re: Support for grabbing multiple consecutive values with nextval()