Re: RFC: Timing Events

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Satoshi Nagayasu <snaga(at)uptime(dot)jp>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: RFC: Timing Events
Date: 2012-11-04 09:35:45
Message-ID: CAFj8pRCPJZa+QWnON4B+bKY+8SrEJb7Mtrr7g+kS9SZtWBHXQw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello

2012/11/4 Satoshi Nagayasu <snaga(at)uptime(dot)jp>:
> (2012/11/03 10:44), Josh Berkus wrote:
>>
>>
>>> I don't see all that going into core without a much bigger push than I
>>> think people will buy. What people really want for all these is a
>>> proper trending system, and that means graphs and dashboards and
>>> bling--not a history table.
>>
>>
>> Well, I'm particularly thinking for autoconfiguration. For example, to
>> set vacuum_freeze_min_age properly, you have to know the XID "burn rate"
>> of the server, which is only available via history. I really don't want
>> to be depending on a graphical monitoring utility to find these things
>> out.
>>
>>> This whole approach has the assumption that things are going to fall off
>>> sometimes. To expand on that theme for a second, right now I'm more
>>> worried about the "99%" class of problems. Neither pg_stat_statements
>>> nor this idea are very good for tracking the rare rogue problem down.
>>> They're both aimed to make things that happen a lot more statistically
>>> likely to be seen, by giving an easier UI to glare at them frequently.
>>> That's not ideal, but I suspect really fleshing the whole queue consumer
>>> -> table idea needs to happen to do much better.
>>
>>
>> I'm just concerned that for some types of incidents, it would be much
>> more than 1% *of what you want to look at* which fall off. For example,
>> consider a server which does 95% reads at a very high rate, but has 2%
>> of its writes cronically having lock waits. That's something you want
>> to solve, but it seems fairly probably that these relatively infrequent
>> queries would have fallen off the bottom of pg_stat_statements. Same
>> thing with the relative handful of queries which do large on-disk sorts.
>>
>> The problem I'm worried about is that pg_stat_statements is designed to
>> keep the most frequent queries, but sometimes the thing you really need
>> to look at is not in the list of most frequent queries.
>
>
> I think auto_explain would help you solve such rare incidents
> if it could dump several statistics into server log, including lock
> waits and block reads/writes statistic per-session, for example.
>
> Do we have something to add to auto_explain?

Now I am working on expanding slow query record and auto_explain with
some locking times (lock on objects, lock on enhancing pages, other
locks).

Just statement time produces too less information in our complex and
"unpredictable" cloud environment with thousand databases and hundreds
servers.

Regards

Pavel Stehule

>
> Regards,
> --
> Satoshi Nagayasu <snaga(at)uptime(dot)jp>
> Uptime Technologies, LLC. http://www.uptime.jp
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2012-11-04 11:49:08 proposal: fix corner use case of variadic fuctions usage
Previous Message Satoshi Nagayasu 2012-11-04 09:28:11 Re: RFC: Timing Events