Timing overhead and Linux clock sources

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Timing overhead and Linux clock sources
Date: 2011-12-07 02:58:06
Message-ID: 4EDED63E.2040002@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Over in the "add timing of buffer I/O requests" thread I mentioned
having a system where EXPLAIN ANALYZE of a modest COUNT(*) takes 10X as
long as just executing the statement. Attached is a bit of SQL and a
script that runs it multiple times that demonstrate the problem on
systems that have it. It assumes you've installed the Dell Store 2
database. I'd expect it to work with simpler example too (just create
something with 20,000 rows), but I haven't actually confirmed that yet.

From what I've been able to measure so far, here's the state of things:

-If you have a system with a working TSC clock source (timing data is
pulled right from the CPU), timing overhead is reasonable enough that
you might turn it on even for things that happen frequently, such as the
buffer I/O timing patch enables.

-Some systems have tsc, but it doesn't work right in multi-core setups.
Newer kernel versions know this and disable it, older ones let you pick
tsc anyway and bad things occur.

-Using any of the other timing mechanism--hpt, acpi_pm, and jiffies--has
extremely high timing overhead.

This doesn't conflict with the comment we got from Martijn van
Oosterhout in the other thread: "Learned the hard way while tracking
clock-skew on a multicore system. The hpet may not be the fastest (that
would be the cpu timer), but it's the fastest (IME) that gives
guaranteed monotonic time." There's more background about TSC, HPET,
and an interesting caveat about VMs I came across:

http://en.wikipedia.org/wiki/Time_Stamp_Counter
http://en.wikipedia.org/wiki/High_Precision_Event_Timer
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1007020

I'm starting to envision what a new documentation section discussing
timer overhead would look like now, for Linux at least. And I'm
guessing other people can simulate this problem even on hardware that
doesn't have it in the default config, simply by manually switching to
one of the slower timing methods at boot.

The test case is just doing this:

\timing
select count(*) from customers;
explain analyze select count(*) from customers;

Lots of times, with some patterns to try and avoid caching effects.

The system with the worst clock timing issues I've found, the one that
was so obvious it kicked off this investigation, is my Thinkpad T60
laptop. The overhead is worst on the old 32-bit Ubuntu 9.04
installation I have there. Here's some info about the system, starting
with the 4 clock sources it supports:

$ cat /sys/devices/system/clocksource/clocksource0/available_clocksource
hpet acpi_pm jiffies tsc
$ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
hpet
$ uname -a
Linux meddle 2.6.28-19-generic #61-Ubuntu SMP Wed May 26 23:35:15 UTC
2010 i686 GNU/Linux
$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=9.04
DISTRIB_CODENAME=jaunty
DISTRIB_DESCRIPTION="Ubuntu 9.04"
$ psql -c "select version()"
PostgreSQL 8.3.11 on i486-pc-linux-gnu, compiled by GCC gcc-4.3.real
(Ubuntu 4.3.3-5ubuntu4) 4.3.3

You change the time source by adding a stanza like this to the kernel
boot options:

clocksource=acpi_pm

Martijn said you can just cat a new clock source into the setting
instead, I haven't evaluated that yet. Wanted to get my baselines done
in a situation where the kernel boots using the one I wanted, which
turns out to be useful for a number of reasons. Boot time is one of the
things that varied a lot when the timer was switched.

The hpet time source is the default on my T60 system, but there are 3
others available. Of those, it appears that only acpi_pm is a viable
alternative. I gave up on the system booting after 15 minutes when
using jiffies. I could boot with tsc, but everything was erratic. I
could barely use the mouse for example. tsc did have a much faster
clock as measured by EXPLAIN overhead though. Here are the results, two
runs of the test program that I've de-interleaved so it's a block of
fast (count only) runs then a block of slow (with EXPLAIN ANALYZE):

Time Source:hpet

Time: 14.783 ms
Time: 9.021 ms
Time: 9.170 ms
Time: 8.634 ms
Time: 5.632 ms
Time: 5.520 ms
Time: 15.276 ms
Time: 9.477 ms
Time: 9.485 ms

Time: 70.720 ms
Time: 67.577 ms
Time: 57.959 ms
Time: 74.661 ms
Time: 73.557 ms
Time: 74.963 ms
Time: 87.898 ms
Time: 82.125 ms
Time: 81.443 ms

Time Source:acpi_pm

Time: 8.124 ms
Time: 5.020 ms
Time: 4.887 ms
Time: 8.125 ms
Time: 5.029 ms
Time: 4.881 ms

Time: 82.986 ms
Time: 82.366 ms
Time: 82.609 ms
Time: 83.089 ms
Time: 82.438 ms
Time: 82.539 ms

Time Source:tsc
Time: 8.371 ms
Time: 4.673 ms
Time: 4.901 ms
Time: 8.409 ms
Time: 4.943 ms
Time: 4.722 ms

Time: 16.436 ms
Time: 16.349 ms
Time: 16.139 ms
Time: 19.871 ms
Time: 17.175 ms
Time: 16.540 ms

There's the 10:1 possible slowdown I was talking about when using the
default hpet clock, while the faster (but fundamentally unstable) tsc
clock does better, only around a 4:1 slowdown there.

First question I considered: is it the operating system or kernel
causing this, rather than the hardware? I have Debian Squeeze running a
64-bit 2.6.32 kernel installed on here too. It only has two time
sources available, with hpet being the default:

$ cat /sys/devices/system/clocksource/clocksource0/available_clocksource
hpet acpi_pm
$ uname -a
Linux meddle 2.6.32-5-amd64 #1 SMP Thu Nov 3 03:41:26 UTC 2011 x86_64
GNU/Linux
$ psql -c "select version()"
version
------------------------------------------------------------------------------------------------------
PostgreSQL 8.4.9 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real
(Debian 4.4.5-8) 4.4.5, 64-bit

It's not coincidence that these are the only two clock sources that
actually worked in the older kernel. jiffies are old stuff that's been
on the deprecation path for a while, and it's eliminating TSC with this
message during boot:

[ 0.168139] checking TSC synchronization [CPU#0 -> CPU#1]:
[ 0.172001] Measured 617820 cycles TSC warp between CPUs, turning off
TSC clock.
[ 0.172001] Marking TSC unstable due to check_tsc_sync_source failed

So that there can be problems with TSC on some hardware is well known
enough that the newer kernel tests, then rules it out as unusable when
it fails that test.

Here's the same timing results on this kernel:

Time Source:hpet
Time: 5.701 ms
Time: 3.581 ms
Time: 3.560 ms
Time: 5.696 ms
Time: 3.587 ms
Time: 3.342 ms

Time: 43.492 ms
Time: 43.227 ms
Time: 43.319 ms
Time: 43.503 ms
Time: 43.332 ms
Time: 43.244 ms

Time Source:acpi_pm
Time: 6.350 ms
Time: 3.792 ms
Time: 3.490 ms
Time: 5.806 ms
Time: 3.672 ms
Time: 3.538 ms

Time: 80.093 ms
Time: 79.875 ms
Time: 79.789 ms
Time: 80.033 ms
Time: 80.330 ms
Time: 80.062 ms

That's much better in every way on hpet. But the time ratio hasn't
really improved; it's still at least 10:1. Unless there's some blacker
magic in an even newer kernel, it seems nothing I can do to this laptop
will make it run gettimeofday() quickly.

Let's move onto something that's actually a usable server. My main
desktop has an Intel i7-860. It has the same 64-bit Debian Squeeze
setup as the second set of T60 results I just showed. The only thing
different about the two systems is the hardware, software is as close to
identical as I could make it. (That's for my own general sanity, not
something special I did for this test--all my systems have that same
Debian package set)

But there are different clock options available:

$ cat /sys/devices/system/clocksource/clocksource0/available_clocksource
tsc hpet acpi_pm
$ uname -a
Linux grace 2.6.32-5-amd64 #1 SMP Thu Nov 3 03:41:26 UTC 2011 x86_64
GNU/Linux
$ psql -c "select version()"
PostgreSQL 8.4.9 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real
(Debian 4.4.5-8) 4.4.5, 64-bit

With tsc being the default clock source here. A passing test of it
looks like this:

Here's some related dmesg output:

[ 0.000000] Fast TSC calibration using PIT
...
0.244640] checking TSC synchronization [CPU#0 -> CPU#1]: passed.
...
[ 0.891992] checking TSC synchronization [CPU#0 -> CPU#7]: passed.
[ 0.911993] Brought up 8 CPUs

And here's how it performs:

Time Source:tsc
Time: 2.915 ms
Time: 1.531 ms
Time: 1.500 ms
Time: 2.926 ms
Time: 1.518 ms
Time: 1.478 ms

Time: 2.945 ms
Time: 2.915 ms
Time: 2.882 ms
Time: 2.920 ms
Time: 2.862 ms
Time: 2.856 ms

That's more like it.

Next up I plan to survey a few AMD systems and a newer laptop, but I'm
out of time for this tonight.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us

Attachment Content-Type Size
timing-compare text/plain 161 bytes
explain-overhead.sql text/x-sql 294 bytes

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-12-07 03:03:31 Re: Large number of open(2) calls with bulk INSERT into empty table
Previous Message Tom Lane 2011-12-07 01:46:27 Re: Inlining comparators as a performance optimisation