Re: RFC: Timing Events

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: RFC: Timing Events
Date: 2012-11-02 00:56:35
Message-ID: 50931A43.7090805@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/1/12 11:54 PM, Josh Berkus wrote:
> For example, it would be really useful to be able to
> see, for example, pg_stat_user_tables from 2 days ago to estimate table
> growth and activity, or pg_stat_replication from 10 minutes ago to
> average replication lag.

I don't see all that going into core without a much bigger push than I
think people will buy. What people really want for all these is a
proper trending system, and that means graphs and dashboards and
bling--not a history table. I have almost all of my customers using
Munin or Cacti or Zabbix or something, and none using pg_statsinfo.
Shoot, static graphs are barely good enough anymore--people really want
dynamic ones driven by client-side Javascript. "Why can't I zoom in on
this Munin graph, this is lame" they tell me. I blame Google Maps for
being the first thing that made all the users so demanding in this area.

But the main weakness of these tools isn't display, is that it's seemed
impractical to get them to collect per-table data, either for
configuration, speed, or display reasons. I'm trying to find a good web
application toolchain to recommend that does that and dynamic graphs,
too. I would never take up the fight to try and build in that direction
in core though. I think most people aren't even consuming the
pg_stat_user_tables data already provided fully yet in userland.

[I fear this topic will turn into a more appropriate one for
pgsql-advocacy in a hurry if it keeps going]

> So, the problem with joining against pg_stat_statements is that a
> special-purpose incident you're looking at (like a lock_wait) might have
> been pushed "off the bottom" of pg_stat_statements even though it is
> still visible in pg_stat_lock_waits. No?

This whole approach has the assumption that things are going to fall off
sometimes. To expand on that theme for a second, right now I'm more
worried about the "99%" class of problems. Neither pg_stat_statements
nor this idea are very good for tracking the rare rogue problem down.
They're both aimed to make things that happen a lot more statistically
likely to be seen, by giving an easier UI to glare at them frequently.
That's not ideal, but I suspect really fleshing the whole queue consumer
-> table idea needs to happen to do much better.

Thanks for the quick feedback, there's a lot of ideas I should
incorporate there I need to chew on.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Smith 2012-11-02 01:19:51 Re: Proposal for Allow postgresql.conf values to be changed via SQL
Previous Message Michael Paquier 2012-11-02 00:04:45 Re: RFC: Timing Events