Re: [PATCH] optional cleaning queries stored in pg_stat_statements

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] optional cleaning queries stored in pg_stat_statements
Date: 2011-11-06 03:42:20
Message-ID: 4EB6021C.2080700@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dne 6.11.2011 03:16, Peter Geoghegan napsal(a):
> 2011/11/6 Tomas Vondra <tv(at)fuzzy(dot)cz>:
>> Hi everyone,
>
>> The patch implements a simple "cleaning" that replaces the parameter
>> values with generic strings - e.g. numbers are turned to ":n", so the
>> queries mentioned above are turned to
>>
>> SELECT abalance FROM pgbench_accounts WHERE aid = :n
>>
>> and thus tracked as a single query in pg_stat_statements.
>
> I'm a couple of days away from posting a much better principled
> implementation of pg_stat_statements normalisation. To normalise, we
> perform a selective serialisation of the query tree, which is hashed.

OK, my patch definitely is not the only possible and if there's a way to
get more info from the planner, the results surely might be better. My
goal was to provide a simple patch that solves the problem better then
I'll be more than happy to remove mine.

> This has the additional benefit of considering queries equivalent even
> when, for example, there is a different amount of whitespace, or if
> queries differ only in their use of aliases, or if queries differ only
> in that one uses a noise word the other doesn't. It also does things

Yup, that's nice. And achieving the same behavior (aliases, noise) would
require a much more complex parser than the one I wrote.

> like intelligently distinguishing between queries with different
> limit/offset constant values, as these constants are deemed to be
> differentiators of queries for our purposes. A guiding principal that
> I've followed is that anything that could result in a different plan
> is a differentiator of queries.

Not sure if I understand it correctly - do you want to keep multiple
records for a query, for each differentiator (different limit/offset
values, etc.)? That does not make much sense to me - even a different
parameter value may cause change of the plan, so I don't see much
difference between a query parameter, limit/offset constant values etc.

If it was possible to compare the actual plan (or at least a hash of the
plan), and keeping one record for each plan, that'd be extremely nice.
You'd see that the query was executed with 3 different plans, number of
calls, average duration etc.

> I intend to maintain a backwards compatible version, as this can be
> expected to work with earlier versions of Postgres.
>
> There will be additional infrastructure added to the parser to support
> normalisation of query strings for the patch I'll be submitting (that
> obviously won't be supported in the version that builds against
> existing Postgres versions that I'll make available). Essentially,
> I'll be adding a length field to certain nodes, to go alongside the
> existing location field (currently just used to highlight an offending
> token in an error message outside the parser). pg_stat_statements will
> then use the location and length field of const nodes to swap out
> constants in the query string.
>
> It's unfortunate that there was a duplicate effort here. With respect,
> the approach that you've taken probably would have turned out to have
> been a bit of a tar pit - I'm reasonably sure that had you pursued it,
> you'd have found yourself duplicating quite a bit of the parser as new
> problems came to light.

The duplicate effort is not a problem. I needed to learn something more
about the internals and how to use the executor hooks, and the
pg_stat_statements is a neat place to learn this. The patch is rather a
side effect of this effort, so no waste on my side.

Anyway you're right that the approach I've taken is not much extensible
without building a parser parallel to the actual one. It was meant as a
simple solution not solving the problem perfectly, just sufficiently for
what I need.

Tomas

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-11-06 04:21:44 Re: Include commit identifier in version() function
Previous Message Thomas Munro 2011-11-06 03:24:37 const correctness