Re: Buffer usage in EXPLAIN and pg_stat_statements (review)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: Euler Taveira de Oliveira <euler(at)timbira(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Buffer usage in EXPLAIN and pg_stat_statements (review)
Date: 2009-10-04 22:15:54
Message-ID: 603c8f070910041515x183ef00cmf20b84d54d69ec79@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 30, 2009 at 10:40 PM, Itagaki Takahiro
<itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> wrote:
>
> Euler Taveira de Oliveira <euler(at)timbira(dot)com> wrote:
>
>> > But there are some confusions in postgres; ReadBufferCount and
>> > BufferHitCount are used for "get" and "hit", but "heap_blks_read"
>> > and "heap_blks_hit" are used for "read" and "hit" in pg_statio_all_tables.
>> I see. :(
>
> I fixed the confusions of get, hit and read in your patch.
>    long        num_hit = ReadBufferCount + ReadLocalBufferCount;
>    long        num_read = num_hit - BufferHitCount - LocalBufferHitCount;
> should be
>    long        num_get = ReadBufferCount + ReadLocalBufferCount;
>    long        num_hit = BufferHitCount + LocalBufferHitCount;
>    long        num_read = num_get - num_hit;
>
> ReadBufferCount means "number of buffer access" :(
>
> Patch attached.

I took a look at this today and I have a couple of comments. The
basic functionality looks useful, but I think the terminology is too
terse. Specific commens:

1. In the EXPLAIN output, I think that the buffers information should
be output on its own line, rather than appended to the line that
already contains costs and execution times. The current output
doesn't include the word "buffers" or "blocks" anywhere, which seems
to me to be a critical flaw. I would suggest something like "Blocks
Read: %ld Hit: %ld Temp Read: %ld\n". See the way we handle output
of sort type and space usage, for example.

2. Similarly, in pg_stat_statements, the Counters structure could
easily use the same names for the structure members that we already
use in e.g. pg_stat_database - blks_hit, blks_read, and, say,
blks_temp_read. In fact I tend to think we should stick with "blocks"
rather than "buffers" overall, for consistency with what the system
does elsewhere.

3. With respect to the doc changes in explain.sgml, we consistently
use "disk" rather than "disc" in the documentation; but it may not be
necessary to use that word at all, and I think the paragraph can be
tightened up a bit: "Include information on the number of blocks read,
the number of those that are hits (already in shared buffers and do
not need to be read in), and the number of those that are reads on
temporary, backend-local buffers. This parameter requires that the
<literal>ANALYZE</literal> parameter also be used. This parameter
defaults to <literal>FALSE</literal>".

4. "Instrumentation stack is broken" doesn't seem terribly helpful in
understanding what has gone wrong.

...Robert

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2009-10-04 22:42:33 Re: Rules: A Modest Proposal
Previous Message Selena Deckelmann 2009-10-04 21:32:15 Re: COPY enhancements