Re: Timing overhead and Linux clock sources

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Greg Smith <greg(at)2ndQuadrant(dot)com>
Cc: Ants Aasma <ants(dot)aasma(at)eesti(dot)ee>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Timing overhead and Linux clock sources
Date: 2012-08-17 02:28:14
Message-ID: 20120817022814.GR30286@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


FYI, I am planning to go ahead and package this tool in /contrib for PG
9.3.

---------------------------------------------------------------------------

On Fri, Dec 9, 2011 at 08:26:12PM -0500, Greg Smith wrote:
> On 12/09/2011 06:48 PM, Ants Aasma wrote:
> >The attached test program (test_gettimeofday_monotonic) shows that one
> >test loop iteration takes 34ns with tsc and 1270ns with hpet.
>
> This test program is great, I've wanted this exact sort of
> visibility into this problem for years. I've toyed with writing
> something like this and then seeing what results it returns on all
> of the build farm animals. For now I can just run it on all the
> hardware I have access to, which is not a small list.
>
> I think we should bundle this up similarly to test_fsync, document
> some best practices on time sources, and then point the vague
> warning about EXPLAIN overhead toward that. Then new sources of
> timing overhead can point there too. Much like low-level fsync
> timing, there's nothing PostgreSQL can do about it, the best we can
> do is provide a program to help users classify a system as likely or
> unlikely to run to have high-overhead timing. I can make the needed
> docs changes, this is resolving a long standing issue impacting code
> I wanted to add.
>
> Rough guideline I'm seeing shape up is that <50ns is unlikely to
> cause clock timing to be a significant problem, >500ns certainly is,
> and values in the middle should concern people but not necessarily
> invalidate the timing data collected.
>
> I just confirmed that switching the clock source by echoing a new
> value into
> /sys/devices/system/clocksource/clocksource0/current_clocksource
> works (on the 2.6.32 kernel at least). What we want to see on a
> good server is a result that looks like this, from my Intel i7-860
> system:
>
> $ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
> tsc
> $ ./test_gettimeofday_monotonic 5
> Per loop: 39.30 ns
> usec: count percent
> 4: 6 0.00000%
> 2: 104 0.00008%
> 1: 4999760 3.92983%
> 0: 122226109 96.07009%
>
> Here's how badly that degrades if I use one of the alternate time sources:
>
> # echo acpi_pm >
> /sys/devices/system/clocksource/clocksource0/current_clocksource
> $ ./test_gettimeofday_monotonic 5
> Per loop: 727.65 ns
> usec: count percent
> 16: 1 0.00001%
> 8: 0 0.00000%
> 4: 1233 0.01794%
> 2: 699 0.01017%
> 1: 4992622 72.65764%
> 0: 1876879 27.31423%
>
> echo hpet > /sys/devices/system/clocksource/clocksource0/current_clocksource
> $ ./test_gettimeofday_monotonic 5
> Per loop: 576.96 ns
> usec: count percent
> 8: 2 0.00002%
> 4: 1273 0.01469%
> 2: 767 0.00885%
> 1: 4993028 57.61598%
> 0: 3670977 42.36046%
>
> Switching to the Intel T7200 CPU in my T60 laptop only provides the
> poor quality time sources, not TSC, and results show timing is
> really slow there:
>
> $ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
> hpet
> $ ./test_gettimeofday_monotonic 5
> Per loop: 1019.60 ns
> usec: count percent
> 256: 2 0.00004%
> 128: 3 0.00006%
> 64: 90 0.00184%
> 32: 23 0.00047%
> 16: 92 0.00188%
> 8: 1246 0.02541%
> 4: 34 0.00069%
> 2: 136154 2.77645%
> 1: 4700772 95.85818%
> 0: 65466 1.33498%
>
> # echo acpi_pm >
> /sys/devices/system/clocksource/clocksource0/current_clocksource
> $ ./test_gettimeofday_monotonic 5
> Per loop: 1864.66 ns
> usec: count percent
> 256: 2 0.00007%
> 128: 0 0.00000%
> 64: 3 0.00011%
> 32: 6 0.00022%
> 16: 90 0.00336%
> 8: 1741 0.06493%
> 4: 2062 0.07690%
> 2: 2260601 84.30489%
> 1: 416954 15.54952%
> 0: 0 0.00000%
>
> This seems to be measuring exactly the problem I only had a
> hand-wave "it's bad on this hardware" explanation of before.
>
> --
> Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
> PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2012-08-17 02:37:41 Slow tab completion w/ lots of tables
Previous Message Bruce Momjian 2012-08-17 01:59:46 Re: RFC: list API / memory allocations