Re: Obtaining resource usage statistics from execution? (v 9.1)

From: Karl Denninger <karl(at)denninger(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Obtaining resource usage statistics from execution? (v 9.1)
Date: 2012-03-16 15:38:32
Message-ID: 4F635E78.8080001@denninger.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 3/16/2012 9:53 AM, Tom Lane wrote:
> Karl Denninger <karl(at)denninger(dot)net> writes:
>> I am trying to continue profiling which in turn feeds query and index
>> tuning changes for the AKCS-WWW forum software, and appear to have no
>> good way to do what I need to do -- or I've missed something obvious.
> Do you really want the application involved in this? Usually, when
> people try to profile a workload, they want everything to just get
> logged on the server side. There are some contrib modules to help
> with that approach --- see auto_explain and pg_stat_statements.
>
> regards, tom lane
Well, I do in some cases, yes. Specifically I'd like to flag instances
"on the fly" when total execution time in the calls exceed some amount.
I can (of course) track the time in the application but what I'd like to
be able to do is track the time that a particular template uses at the
millisecond level in database calls.

pg_stat_statements is generalized across the server which is useful for
profiling on an aggregate basis but not so much among individual uses.
auto_explain doesn't tell me what upstream call generated the log; I
have to go back and figure that out by hand and it's in the server logs
which isn't really where I want it.

One of the things that doing this in the application would allow is that
the queries in question are all generated by a template file that is
basically HTML with carefully-crafted commentish-appearing things that
the parser uses; for instance:

<!--&number--> (push previously-obtained table value for "number" in
the current row into the selector)
<!--*post--> (look up the post using the selector that was just pushed)
<!--(at)message--> (replace the comment text with the field "message" from
the returned lookup)
<!--^--> (loop back / recursion level closure)
<!--$--> (end of loop / recursion level)

That's a simple example; in the real template for each of those messages
there are other things pushed off and other tables referenced, such as
the user's statistics, their avatar, recursion levels can be tagged in
closing statements, etc. I'd like to be able to, in the code that
executes that lookup (when "post" is parsed) grab the time required to
execute that SQL statement and stash it, possibly into another table
with the tag ("post") so I know what query class generated it, and then
at the end of the template itself pick up the aggregate for the entire
template's database processing. If I can pull the time for each query
generated by the template in the code adding it all up is trivial for
the template as a whole.

This would allow me to profile the total execution time for each of the
templates in the system at both the template level and the individual
function call level out of the template. If I then stash that data
somewhere (e.g. into a different table) I can then look for the
worst-performing queries and templates and focus my attention on the
ones that get run the most so my "bang for the buck in optimization
time" is improved.

Most of these queries are very standard but not all. One of the deviant
examples of the latter is the "search" function exposed to the user that
can craft a rather-complex query in response to what the user fills in
on the form that is submitted targeting both simple field lookups and
GiST full-text indices.

Doing this the hard way has returned huge dividends as I found a query
that was previously running in the ~2,000 ms range and is executed very
frequently that responded extraordinarily well to a compound index
addition on one of the target tables. That in turn dropped its
execution time to double-digit milliseconds (!!) for most cases and the
user experience improvement from that change was massive. I don't
expect to find many more "wins" of that caliber but I'll take as many as
I can get :-)

The argument for this in the application comes down to the granularity
and specificity that I gain from knowing exactly what function call in
the template generated the timing being reported.

(If it matters, Postgres 9.1.3 on FreeBSD 8.x)

--
-- Karl Denninger
/The Market Ticker ®/ <http://market-ticker.org>
Cuda Systems LLC

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Brian Hamlin 2012-03-16 21:00:11 Re: Shared memory for large PostGIS operations
Previous Message Tomas Vondra 2012-03-16 15:08:25 Re: Obtaining resource usage statistics from execution? (v 9.1)