I propose a patch that would allow optional "cleaning" of queries
tracked in pg_stat_statements, compressing the result and making it more
The default behavior is that when the same query is run with different
parameter values, it's actually stored as two separate queries (the
string do differ).
A small example - when you run "pgbench -S" you'll get queries like
SELECT abalance FROM pgbench_accounts WHERE aid = 12433
SELECT abalance FROM pgbench_accounts WHERE aid = 2322
SELECT abalance FROM pgbench_accounts WHERE aid = 52492
and so on, and each one is listed separately in the pg_stat_statements.
This often pollutes the pg_stat_statements.
The patch implements a simple "cleaning" that replaces the parameter
values with generic strings - e.g. numbers are turned to ":n", so the
queries mentioned above are turned to
SELECT abalance FROM pgbench_accounts WHERE aid = :n
and thus tracked as a single query in pg_stat_statements.
The patch provides an enum GUC (pg_stat_statements.clean) with three
options - none, basic and aggressive. The default option is "none", the
"basic" performs the basic value replacement (described above) and
"aggressive" performs some additional cleaning - for example replaces
multiple spaces with a single one etc.
The parsing is intentionally very simple and cleans the query in a
single pass. Currently handles three literal types:
a) string (basic, C-style escaped, Unicode-escaped, $-espaced)
b) numeric (although 1.925e-3 is replaced by :n-:n)
c) boolean (true/false)
There is probably room for improvement (e.g. handling UESCAPE).
pgsql-hackers by date
|Next:||From: Peter Geoghegan||Date: 2011-11-06 02:16:20|
|Subject: Re: [PATCH] optional cleaning queries stored in pg_stat_statements|
|Previous:||From: Jeff Janes||Date: 2011-11-05 18:46:58|
|Subject: Re: Include commit identifier in version() function|