Re: EXPLAIN ANALYZE printing logical and hardware I/O per-node

From: Neil Conway <neilc(at)samurai(dot)com>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: pgsql-hackers list <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: EXPLAIN ANALYZE printing logical and hardware I/O per-node
Date: 2007-12-15 05:10:35
Message-ID: 1197695435.4932.7.camel@goldbach
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 2007-12-14 at 15:47 +0000, Gregory Stark wrote:
> I've wanted for a long time to have EXPLAIN ANALYZE output per-node I/O usage.
> This would be especially useful if we could distinguish hardware versus
> logical I/O though. And I always thought that would be very hard.
>
> My thought in the past was that would could do it on Solaris by having
> Postgres use DTrace directly via its (undocumented but existing) programmatic
> interface.
>
> For other operating systems it was tempting to suggest just timing the read(2)
> call to see if it took too long to be a logical operation. The problem there
> is that gettimeofday would impose far too much overhead to make that practical
> (or even be precise enough to work properly).
>
> But it occurred to me just now that the hardware instruction counter available
> on just about every platform would be good enough for a heuristic guess at
> whether the read(2) was cached.

I'm skeptical that this would be reliable enough to be very useful,
especially in the face of concurrent, unpredictable system activity on a
busy system. I agree that it would be useful information, though.
Perhaps a useful first step would be to teach EXPLAIN ANALYZE to report
the number of logical and physical I/Os from Postgres' perspective (i.e.
physical I/O just means we need to go to the kernel).

> The problem generally with using the hardware
> instruction counter is that it's not necessarily in sync between processors
> and might therefore run backwards or skip time forwards. This is a problem for
> profiling but if all we care about is a boolean guess at whether the request
> was satisfied quickly from cache then any such skipping forward or backward
> would represent a context switch which we could just toss in the hardware i/o
> bucket. It doesn't matter exactly how long the hardware i/o took, only that
> there was one.
>
> To that end I would love to see something like:
>
> QUERY PLAN
> -------------------------------------------------------------------------------------------------------------
> Bitmap Heap Scan on h (cost=8.52..16.45 rows=2 width=512) (actual time=78.926..87.708 rows=2 loops=1 logical-I/O=2 physical-I/O=1)
> Recheck Cond: (i = ANY ('{100,1000}'::integer[]))
> -> Bitmap Index Scan on hi (cost=0.00..8.52 rows=2 width=0) (actual time=74.539..74.539 rows=2 loops=1 logical-I/O=2 physical-I/O=2))
> Index Cond: (i = ANY ('{100,1000}'::integer[]))
> Total runtime: 87.820 ms
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2007-12-15 06:12:58 Re: VLDB Features
Previous Message Josh Berkus 2007-12-15 04:34:57 Re: VLDB Features