Re: [PATCH] optional cleaning queries stored in pg_stat_statements

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] optional cleaning queries stored in pg_stat_statements
Date: 2011-11-06 19:56:12
Message-ID: 4EB6E65C.90406@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/06/2011 01:08 PM, Tom Lane wrote:
> Peter Geoghegan<peter(at)2ndquadrant(dot)com> writes:
>
>> ... It also does things
>> like intelligently distinguishing between queries with different
>> limit/offset constant values, as these constants are deemed to be
>> differentiators of queries for our purposes. A guiding principal that
>> I've followed is that anything that could result in a different plan
>> is a differentiator of queries.
>>
> This claim seems like bunk, unless you're hashing the plan tree,
> in which case it's tautological.

Peter's patch adds a planner hook and hashes at that level. Since this
cat is rapidly escaping its bag and impacting other contributors, we
might as well share the work in progress early if anyone has a burning
desire to look at the code. A diff against the version I've been
internally reviewing in prep for community submission is at
https://github.com/greg2ndQuadrant/postgres/compare/master...pg_stat_statements_norm
Easier to browse than ask questions probing about it I think. Apologies
to Peter for sharing his work before he was completely ready; there is a
list of known problems with it. I also regret the thread hijack here.

The first chunk of code is a Python based regression test program, and
an open item here is the best way to turn that into a standard
regression test set.

>> There will be additional infrastructure added to the parser to support
>> normalisation of query strings for the patch I'll be submitting (that
>> obviously won't be supported in the version that builds against
>> existing Postgres versions that I'll make available). Essentially,
>> I'll be adding a length field to certain nodes,
>>
> This seems like a good way to get your patch rejected: adding overhead
> to the core system for the benefit of a feature that not everyone cares
> about is problematic.

Struggling around this area is the main reason this code hasn't been
submitted yet. To step back for a moment, there are basically three
options here that any code like this can take, in regards to storing the
processed query name used as the key:

1) Use the first instance of that query seen as the "reference" version
2) Use the last instance seen
3) Transform the text of the query in a way that's stable across all
duplicates of that statement, and output that

Downstream tools operating on this data, things that will sample
pg_stat_statements multiple times, need some sort of stable query text
they can operate on. It really needs to survive server restart too.
Neither (1) nor (2) seem like usable solutions. Both have been
implemented already in Peter's patch, with (2) being what's in the
current patch. How best to do (3) instead is not obvious though.

Doing the query matching operating at the planner level seems effective
at side-stepping the problem of needing to parse the SQL, which is where
most implementations of this idea get stuck doing fragile
transformations. My own first try at this back in September and Tomas's
patch both fall into that category. That part of Peter's work seems to
work as expected. That still leaves the problem of "parsed query ->
stable normalized string". I think that is an easier one to solve than
directly taking on the entirety of "query text -> stable normalized
string" though. Peter has an idea he's exploring for how to implement
that, and any amount of overhead it adds to people who don't use this
feature is an obvious concern. There are certainly use cases that don't
care about this problem, ones that would happily take (1) or (2). I'd
rather not ship yet another not quite right for everyone version of
pg_stat_statements again though; only solving too limited of a use case
is the big problem with the one that's already there.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Kupershmidt 2011-11-06 20:29:16 Re: proposal: psql concise mode
Previous Message Florian Pflug 2011-11-06 18:18:39 Re: unaccent extension missing some accents