pg_test_timing tool for EXPLAIN ANALYZE overhead

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, ants(dot)aasma(at)eesti(dot)ee
Subject: pg_test_timing tool for EXPLAIN ANALYZE overhead
Date: 2012-02-22 11:53:49
Message-ID: 4F44D74D.5050605@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Attached is a feature extracted from the Ants Aasma "add timing of
buffer I/O requests" submission. That included a tool to measure timing
overhead, from gettimeofday or whatever else INSTR_TIME_SET_CURRENT
happens to call. That's what I've broken out here; it's a broader topic
than just buffer timing.

I fixed some trivial bugs and cleaned up the output of the program, then
wrote a full documentation section for it. After that review, I think
this could be ready to commit...with a big picture caveat below. Code
wise, I am mainly concerned about its portability, and that's not often
a place we get good review help on. The tool is based on pg_test_fsync.
Perhaps Bruce might want to take a look at this, to see if any of the
changes he's recently made to pg_test_fsync impact what this utility
should do? He might be able to flesh out the FreeBSD examples too.

As for why this whole topic is important, I've found the results of this
new pg_test_timing track quite well with systems where EXPLAIN ANALYZE
timing overhead is large. As such, it fills in a gap in the existing
docs, where that possibility is raised but no way was given to measure
it--nor determine how to improve it. I expect we'll be worried about
how large timing overhead is more for future features, with the first
example being the rest of Ants's own submission.

A look back on this now that I'm done with it does raise one large
question though. I added some examples of how to measure timing
overhead using psql. While I like the broken down timing data that this
utility provides, I'm not sure whether it's worth adding a contrib
module just to get it now though. Extension that's packaged on
something like PGXN and easy to obtain? Absolutely--but maybe that's a
developer only level thing. Maybe the only code worth distributing is
the little SQL example of how to measure the overhead, along with some
reference good/bad numbers. That plus the intro to timer trivia could
turn this into a documentation section only, no code change. I've
dreamed of running something like this on every system in the build
farm. Even if that's a valuable exercise, even then it may only be
worth doing once, then reverting.

Anyway, the patch does now includes several examples and a short primer
on PC clock hardware, to help guide what good results look like and why
they've been impossible to obtain in the past. That's a bit
Linux-centric, but the hardware described covers almost all systems
using Intel or AMD processors. Only difference with most other
operating systems is how aggressively they have adopted newer timer
hardware. At least this gives a way to measure all of them.

Some references used to put together the clock source tutorial:

Microsoft's intro to HPET:
http://msdn.microsoft.com/en-us/windows/hardware/gg463347
Notes on effective clock resolution:
http://elinux.org/System_Tap_Timestamp_Notes
VMware clock history and impact on VMs:
http://www.vmware.com/files/pdf/Timekeeping-In-VirtualMachines.pdf
VMware timer suggestions for various Linux versions:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1006427

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

Attachment Content-Type Size
pg_test_timing-v3.patch text/x-patch 15.8 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2012-02-22 12:30:46 Re: 16-bit page checksums for 9.2
Previous Message Sandro Santilli 2012-02-22 10:16:56 Re: Runtime SHAREDIR for testing CREATE EXTENSION