Skip site navigation (1) Skip section navigation (2)

Re: Timing overhead and Linux clock sources

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: Ants Aasma <ants(dot)aasma(at)eesti(dot)ee>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Timing overhead and Linux clock sources
Date: 2011-12-10 01:26:12
Message-ID: 4EE2B534.40502@2ndQuadrant.com (view raw or flat)
Thread:
Lists: pgsql-hackers
On 12/09/2011 06:48 PM, Ants Aasma wrote:
> The attached test program (test_gettimeofday_monotonic) shows that one
> test loop iteration takes 34ns with tsc and 1270ns with hpet.
>    

This test program is great, I've wanted this exact sort of visibility 
into this problem for years.  I've toyed with writing something like 
this and then seeing what results it returns on all of the build farm 
animals.  For now I can just run it on all the hardware I have access 
to, which is not a small list.

I think we should bundle this up similarly to test_fsync, document some 
best practices on time sources, and then point the vague warning about 
EXPLAIN overhead toward that.  Then new sources of timing overhead can 
point there too.   Much like low-level fsync timing, there's nothing 
PostgreSQL can do about it, the best we can do is provide a program to 
help users classify a system as likely or unlikely to run to have 
high-overhead timing.  I can make the needed docs changes, this is 
resolving a long standing issue impacting code I wanted to add.

Rough guideline I'm seeing shape up is that <50ns is unlikely to cause 
clock timing to be a significant problem, >500ns certainly is, and 
values in the middle should concern people but not necessarily 
invalidate the timing data collected.

I just confirmed that switching the clock source by echoing a new value 
into /sys/devices/system/clocksource/clocksource0/current_clocksource 
works (on the 2.6.32 kernel at least).  What we want to see on a good 
server is a result that looks like this, from my Intel i7-860 system:

$ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc
$ ./test_gettimeofday_monotonic 5
Per loop: 39.30 ns
      usec:      count   percent
         4:          6  0.00000%
         2:        104  0.00008%
         1:    4999760  3.92983%
         0:  122226109 96.07009%

Here's how badly that degrades if I use one of the alternate time sources:

# echo acpi_pm > 
/sys/devices/system/clocksource/clocksource0/current_clocksource
$ ./test_gettimeofday_monotonic 5
Per loop: 727.65 ns
      usec:      count   percent
        16:          1  0.00001%
         8:          0  0.00000%
         4:       1233  0.01794%
         2:        699  0.01017%
         1:    4992622 72.65764%
         0:    1876879 27.31423%

echo hpet > /sys/devices/system/clocksource/clocksource0/current_clocksource
$ ./test_gettimeofday_monotonic 5
Per loop: 576.96 ns
      usec:      count   percent
         8:          2  0.00002%
         4:       1273  0.01469%
         2:        767  0.00885%
         1:    4993028 57.61598%
         0:    3670977 42.36046%

Switching to the Intel T7200 CPU in my T60 laptop only provides the poor 
quality time sources, not TSC, and results show timing is really slow there:

$ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
hpet
$ ./test_gettimeofday_monotonic 5
Per loop: 1019.60 ns
      usec:      count   percent
       256:          2  0.00004%
       128:          3  0.00006%
        64:         90  0.00184%
        32:         23  0.00047%
        16:         92  0.00188%
         8:       1246  0.02541%
         4:         34  0.00069%
         2:     136154  2.77645%
         1:    4700772 95.85818%
         0:      65466  1.33498%

# echo acpi_pm > 
/sys/devices/system/clocksource/clocksource0/current_clocksource
$ ./test_gettimeofday_monotonic 5
Per loop: 1864.66 ns
      usec:      count   percent
       256:          2  0.00007%
       128:          0  0.00000%
        64:          3  0.00011%
        32:          6  0.00022%
        16:         90  0.00336%
         8:       1741  0.06493%
         4:       2062  0.07690%
         2:    2260601 84.30489%
         1:     416954 15.54952%
         0:          0  0.00000%

This seems to be measuring exactly the problem I only had a hand-wave 
"it's bad on this hardware" explanation of before.

-- 
Greg Smith   2ndQuadrant US    greg(at)2ndQuadrant(dot)com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


In response to

Responses

pgsql-hackers by date

Next:From: Tom LaneDate: 2011-12-10 03:32:09
Subject: Why do regression tests now leave "regress_test_role_super" around?
Previous:From: Ants AasmaDate: 2011-12-09 23:48:45
Subject: Re: Timing overhead and Linux clock sources

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group