First off, let me again praise the great work you and Peter are doing in
> Modeling this on pg_stat_statements includes the hope of packaging it as
> an extension with minor core hooks, and the idea that there would be a
> fixed size list of timing events available at any time. Consider it a
> FIFO queue built using a circular buffer. Dump events into there and
> provide a way to view them. If some fall out before they're
> analyzed/saved, that doesn't seem a problem for now.
> If you want 100%
> durable, the log is still available. Eventually I'd like a consumer for
> these that wrote them to a history table as an option, but that seems a
> second priority after getting them into memory in the first place.
Yes, and easy for users/tools to implement once the basic data is out there.
> like that consumer thing for pg_stat_statements too, but so far we're
> getting by without it. It seems like something that might benefit from
> the in-core queuing work one day too.
Well, we could actually use such a thing in general, and not just for
the timing events. For example, it would be really useful to be able to
see, for example, pg_stat_user_tables from 2 days ago to estimate table
growth and activity, or pg_stat_replication from 10 minutes ago to
average replication lag. There was a plug-in tool for this, I think
Itagaki developed it. Anyone remember what/where it is?
So there's actually two things for a consumer: Timing Events and Counters.
> = Initial event list =
> There are three particular events I have been thinking about, hoping
> that will be enough to make sure the structure is general enough to
> handle more later:
> -log_lock_waits output. This is my #1 goal. Lock waits are incredibly
> common issues in larger systems, and the process for tracking them down
> right now is very painful. Probably deadlocks too.
> -log_temp_files output. This everyone wants to monitor, but it won't be
> as useful unless query tieing (below) works.
> -log_checkpoints. The fight to make this data available via
> pg_stat_bgwriter has been useful. You really need a trending system to
> do anything useful with the data that way though. Being able to write a
> simple timing events query that looked for checkpoint spikes would be
> great for alerting, which is often done with something other than a
> trending package. To give specific examples here, the graphs Munin
> makes for pg_stat_bgwriter data work, but they're not much of a help if
> what you really want is to trigger a Nagios alert when checkpoints are
> slow. A Timing Event entry of a slow checkpoint would be easy to
> trigger alerts from.
I agree that these are the events to start with. Let me add the
additional events I think we want to have, in the order of how useful
the information is for troubleshooting:
4. log_connections / log_disconnections
5. deadlock reporting
6. autovacuum logging
I also think that we should eventually expose a buffer of the last N
error messages, but that's a different sort of data.
> = Data to save =
> There's a few obvious fields that could be explicitly saved as regular
> -event type code. Presumably an integer that's decoded via a master
> list of these, in some header file.
> -running process id
+ running session id
> -running process details (below)
> -referenced process id. The one causing the problem in situations like
> a lock wait.
- referenced session id
> -referenced process details (below)
> -serial number. People who will want to poll & store this data might
> want a better way to do that than just a timestamp. Would let you know
> when you lost some of it, too, so you'd know to consider increasing the
> buffer size or decrease the polling interval.
By the time the user looks at this information, details on the process
from pg_stat_activity will probably no longer be available. This makes
data like process_id not that useful, and means we need to replicate all
relevant data from pg_stat_activity.
> Another way this data might be stored is to have a key/value composite
> type, and then make each timing event entry an array of those.
If only PostgreSQL had some sort of key-value composite storage type
available! I have no idea where we would get such a thing. ;-)
> I'm open to other ideas for how to store this. If the events go into
> one place, the different data each saves turns into the classic
> "flexible schema" problem.
I think it's an excellent approach, not in the least because it lets us
add new data later, and deals with the variety of fields we need to
> = Query text =
> For the often multi-line query text that's dumped by log_lock_waits,
> I've thought of two possibilities so far:
> -Mash them into a single text line using "\n" between them. Now you're
> even back to where, with the right escaping, the entire timing event
> detail part could still collapse into a single text string. That would
> kill the idea of simple parsing of key/value data stuffed there though.
> -Create an array of lines to hold the two statements. Can always make
> the view that shows the data unnest it. Seems like this would compel
> putting the two query description arrays into their own fields though.
I think you need to have a separate field for the queries. Do we have
anything we want to log which would have more than two queries?
> = Query Tieing =
> I'd like to have a way to reference the normalized version of the active
> query or queries at the time when that's appropriate too. That seems
> like it should be a key that's joinable with the normalized version in
> pg_stat_statements. One of the ideas bounced around in the "Hash id in
> pg_stat_statements" thread, like assigning a numeric code for them,
> would seem fine for that. This will turn dependent on one of those
> ideas being committed.
So, the problem with joining against pg_stat_statements is that a
special-purpose incident you're looking at (like a lock_wait) might have
been pushed "off the bottom" of pg_stat_statements even though it is
still visible in pg_stat_lock_waits. No?
> = Limitations of scope =
> I was thinking the act of loading the extension for this would turn the
> logging on, and the existing GUCs like log_lock_waits will also apply.
> It seems useful and easy to have a load time option for the module that
> says to disable all the replaced log file writes if you're using the
> timing event mechanism. It would be nice to have a granular way to ask
> for some timing events: "I want to see lock waits but not temp files".
> That seems like scope creep I don't think is necessary to make this
That depends on whether we think any of these types of events,
individually, will add substantial overhead.
> All these initial targets are events that do not occur that often, and
> I'm not too concerned about the performance of this implementation. Lock
> waits come out once per second per connection. temp files shouldn't be
> that common. And checkpoints are infrequent. For now, I hope that only
> targeting infrequent events means the overhead of dumping the timing
> event into memory will be little overhead.
log_connections/disconnections might not be low-frequency, though. So
we should plan for that.
> Alternately, think about the overhead of these initial targets like
> this: right now, all of these things are being written to the log file.
> If you can afford to do that, you can certainly afford to write a copy
> to memory instead, and probably in addition too.
To say nothing of the overhead of parsing/processing them later ...
PostgreSQL Experts Inc.
In response to
pgsql-hackers by date
|Next:||From: Michael Paquier||Date: 2012-11-02 00:04:45|
|Subject: Re: RFC: Timing Events|
|Previous:||From: Greg Smith||Date: 2012-11-01 23:08:39|
|Subject: RFC: Timing Events|