Re: RFC: Timing Events

From: Satoshi Nagayasu <snaga(at)uptime(dot)jp>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: RFC: Timing Events
Date: 2012-11-04 09:28:11
Message-ID: 5096352B.4020108@uptime.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

(2012/11/03 10:44), Josh Berkus wrote:
>
>> I don't see all that going into core without a much bigger push than I
>> think people will buy. What people really want for all these is a
>> proper trending system, and that means graphs and dashboards and
>> bling--not a history table.
>
> Well, I'm particularly thinking for autoconfiguration. For example, to
> set vacuum_freeze_min_age properly, you have to know the XID "burn rate"
> of the server, which is only available via history. I really don't want
> to be depending on a graphical monitoring utility to find these things out.
>
>> This whole approach has the assumption that things are going to fall off
>> sometimes. To expand on that theme for a second, right now I'm more
>> worried about the "99%" class of problems. Neither pg_stat_statements
>> nor this idea are very good for tracking the rare rogue problem down.
>> They're both aimed to make things that happen a lot more statistically
>> likely to be seen, by giving an easier UI to glare at them frequently.
>> That's not ideal, but I suspect really fleshing the whole queue consumer
>> -> table idea needs to happen to do much better.
>
> I'm just concerned that for some types of incidents, it would be much
> more than 1% *of what you want to look at* which fall off. For example,
> consider a server which does 95% reads at a very high rate, but has 2%
> of its writes cronically having lock waits. That's something you want
> to solve, but it seems fairly probably that these relatively infrequent
> queries would have fallen off the bottom of pg_stat_statements. Same
> thing with the relative handful of queries which do large on-disk sorts.
>
> The problem I'm worried about is that pg_stat_statements is designed to
> keep the most frequent queries, but sometimes the thing you really need
> to look at is not in the list of most frequent queries.

I think auto_explain would help you solve such rare incidents
if it could dump several statistics into server log, including lock
waits and block reads/writes statistic per-session, for example.

Do we have something to add to auto_explain?

Regards,
--
Satoshi Nagayasu <snaga(at)uptime(dot)jp>
Uptime Technologies, LLC. http://www.uptime.jp

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2012-11-04 09:35:45 Re: RFC: Timing Events
Previous Message Craig Ringer 2012-11-04 08:39:16 Re: Unresolved error 0xC0000409 on Windows Server