Skip site navigation (1) Skip section navigation (2)

Re: Patch: add timing of buffer I/O requests

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org, ants(dot)aasma(at)eesti(dot)ee
Subject: Re: Patch: add timing of buffer I/O requests
Date: 2012-02-22 14:43:40
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers
Attached are updated versions of this feature without the pg_test_timing 
tool part, since I broke that out into another discussion thread.  I've 
split the part that updates pg_stat_statistics out from the main feature 
too, separate patch attached to here (but I'm not reviewing that yet). 
Lots of bitrot since this was submitted, and yes I noticed that I've 
almost recreated earlier versions of this patch--by splitting off the 
parts that were developed later.

Earlier discussion of this got side tracked on a few things, partly my 
fault. It's worth taking a look at what this provides before judging it 
too much.  It can demo well.

The stated purpose is helping figure out what relations are gobbling up 
the most access time, presumably to optimize them and/or the storage 
they are on.  "What do I put onto SSD" is surely a popular request 
nowadays.  To check suitability for that, I decided to run the standard 
pgbench test and see what it would show lots of time being consumed by. 
  Any answer other than "pgbench_accounts and to a lesser extent its 
index" is a failing grade.  I started with a clean database and OS cache 
so I'd get real read timings:

$ psql -d pgbench -x -c "select 
idx_blks_read ,idx_blks_hit,idx_blks_time
from pg_statio_user_tables where idx_blks_read > 0
order by heap_blks_time desc"

relname        | pgbench_accounts
heap_blks_read | 7879
heap_blks_hit  | 43837
heap_blks_time | 151770
idx_blks_read  | 7503
idx_blks_hit   | 60484
idx_blks_time  | 70968

relname        | pgbench_tellers
heap_blks_read | 19
heap_blks_hit  | 15856
heap_blks_time | 105
idx_blks_read  | 11
idx_blks_hit   | 15745
idx_blks_time  | 62

relname        | pgbench_branches
heap_blks_read | 11
heap_blks_hit  | 32333
heap_blks_time | 77
idx_blks_read  | 2
idx_blks_hit   | 0
idx_blks_time  | 9

Now, the first critical question to ask is "what additional information 
is this providing above the existing counters?"  After all, it's 
possible to tell pgbench_accounts is the hotspot just from comparing 
heap_blks_read, right?  To really be useful, this would need to make it 
obvious that reads from pgbench_accounts are slower than the other two, 
because it's bigger and requires more seeking around to populate.  That 
should show up if we compute time per read numbers:

$ psql -d pgbench -x -c "select relname,
1.0 * heap_blks_time / heap_blks_read as time_per_read,
1.0 * idx_blks_time / idx_blks_read as time_per_idx_read
from pg_statio_user_tables where idx_blks_read > 0
order by heap_blks_time desc"

relname           | pgbench_accounts
time_per_read     | 19.2625967762406397
time_per_idx_read | 9.4586165533786485

relname           | pgbench_tellers
time_per_read     | 5.5263157894736842
time_per_idx_read | 5.6363636363636364

relname           | pgbench_branches
time_per_read     | 7.0000000000000000
time_per_idx_read | 4.5000000000000000

This run looks useful at providing the data wished for--that read times 
are slower per capita from the accounts table.  The first time I tried 
this I got a bizarre high number for pgbench_branches.heap_blks_time ; 
I'm not sure how reliable this is yet.  One problem that might be easy 
to fix is that the write timing info doesn't show in any of these system 
views, only in EXPLAIN and statement level ones.

I still think a full wait timing interface is the right long-term 
direction here.  It's hard to reject this idea when it seems to be 
working right now though, while more comprehensive wait storage is still 
at least a release off.   Opinions welcome, I'm still juggling this 
around now that I have it working again.

Some implementation notes.  This currently fails regression test 
create_function_3, haven't looked into why yet.  I've confirmed that on 
a system where timing is cheap, this is too.  On something touching real 
data, not just a synthetic thing moving memory around, Aants couldn't 
measure it on a server similar to mine; I can't either.  Yes, this is 
going to gobble up more room for statistics.

The track_iotiming GUC seems to work as expected.  Off by default, can 
turn it on in a session and see that session's work get timed, and it 
toggles on a config reload.  Everything needed to only turn it on 
selectively; main penalty you pay all the time is the stats bloat.

Greg Smith   2ndQuadrant US    greg(at)2ndQuadrant(dot)com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support

Attachment: io-stats.v3.patch
Description: text/x-patch (47.4 KB)
Attachment: io-stats-statment.v3.patch
Description: text/x-patch (9.4 KB)

In response to


pgsql-hackers by date

Next:From: Tom LaneDate: 2012-02-22 15:21:47
Subject: Re: leakproof
Previous:From: Alexander KorotkovDate: 2012-02-22 13:56:03
Subject: Re: Incorrect behaviour when using a GiST index on points

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group