Re: Improvement of pg_stat_statement usage about buffer hit ratio

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, KONDO Mitsumasa <kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp>, Haribabu kommi <haribabu(dot)kommi(at)huawei(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Improvement of pg_stat_statement usage about buffer hit ratio
Date: 2013-11-20 04:30:11
Message-ID: CAM3SWZTWbHDtF124a4u_Pk1iQYTQonhZGvEZkdLb-7+vxhMxFg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 19, 2013 at 12:12 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>> > The same idea was proposed before but not committed because
>> > Itagaki thought that pg_stat_statements view should report only raw
>> > values.
>> > Please read the following thread. I have the same feeling with him.
>> > Anyway we should listen to more opinions from other people, though.
>> >
>> > http://www.postgresql.org/message-id/20091222172719.8B86.52131E4D@oss.ntt.co.jp
>>
>> +1 from me.
>
>
> That's +1 for *not* including this?

Right. Unfortunately, discussion of the other pg_stat_statements
patches spilled into this thread.

>> I think that a higher level tool needs to be built on
>> pg_stat_statements to make things easy for those that want a slick,
>> pre-packaged solution. As I said on the min/max thread, if we're not
>> doing enough to help people who would like to build such a tool, we
>> should discuss how we can do better.
>
>
> I'd like to be able to separate queries by the application_name and/or
> client_addr which issued them.

I see why that would be desirable (it would also be desirable to
sometimes *not* break costs down by user), but it's hard to see a way
to make that work without hurting cases where it doesn't matter (i.e.
the majority of cases).

So thinking about it some more, you could hack together a patch fairly
easily that adds a GUC that makes the current userid a union. A
setting remembers the current preference as to what that actually
stores. You remember the setting on shutdown, when serializing the
stats, in the file header. You can set the setting to hash a whole
bunch of alternative things along the lines you refer to here, such as
client_addr or application_name.

I can see why some aspects of that would be controversial, but it
could probably be made to work. Maybe the hash is used in a second
shared dynahash, where the original values are looked-up, so a new
text column is added to the view that displays things as needed that
way (when the GUC isn't set to its default, userid is NULL, else this
new column is NULL). It would probably be okay to make this spill to
disk, simply because it wouldn't come up that often, and you could be
clever about the locking. Notably, something like a client_addr is
stable within a backend, so it would be cheap to cache the hash value.

The logical/physical I/O divide is probably the main factor that makes
the track_io_timing stuff compelling.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabrízio de Royes Mello 2013-11-20 04:35:58 Re: CREATE TABLE IF NOT EXISTS AS
Previous Message Tom Lane 2013-11-20 04:22:26 Re: UNNEST with multiple args, and TABLE with multiple funcs