Re: [PATCH] optional cleaning queries stored in pg_stat_statements

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
Cc: Tomas Vondra <tv(at)fuzzy(dot)cz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] optional cleaning queries stored in pg_stat_statements
Date: 2011-11-06 15:08:21
Message-ID: 2223.1320592101@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Peter Geoghegan <peter(at)2ndquadrant(dot)com> writes:
> I'm a couple of days away from posting a much better principled
> implementation of pg_stat_statements normalisation. To normalise, we
> perform a selective serialisation of the query tree, which is hashed.

That seems like an interesting approach, and definitely a lot less of
a kluge than what Tomas suggested. Are you intending to hash the raw
grammar output tree, the parse analysis result, the rewriter result,
or the actual plan tree? I don't necessarily have a preconceived notion
about which is best (I can think of pluses and minuses for each), but
we should hear your explanation of which one you chose and what your
reasoning was.

> ... It also does things
> like intelligently distinguishing between queries with different
> limit/offset constant values, as these constants are deemed to be
> differentiators of queries for our purposes. A guiding principal that
> I've followed is that anything that could result in a different plan
> is a differentiator of queries.

This claim seems like bunk, unless you're hashing the plan tree,
in which case it's tautological. As an example, switching from
"where x < 1" to "where x < 2" could make the planner change plans,
if there's a sufficiently large difference in the selectivity of the two
cases (or even if there's a very small difference, but we're right at
the tipping point where the estimated costs are equal). There's no way
to know this in advance unless you care to duplicate all the planner
cost estimation logic. And I don't think you actually want to classify
"where x < 1" and "where x < 2" differently just because they *might*
give different plans in corner cases.

I'm not real sure whether it's better to classify on the basis of
similar plans or similar original queries, anyway. This seems like
something that could be open for debate about use-cases.

> There will be additional infrastructure added to the parser to support
> normalisation of query strings for the patch I'll be submitting (that
> obviously won't be supported in the version that builds against
> existing Postgres versions that I'll make available). Essentially,
> I'll be adding a length field to certain nodes,

This seems like a good way to get your patch rejected: adding overhead
to the core system for the benefit of a feature that not everyone cares
about is problematic. Why do you need it anyway? Surely it's possible
to determine the length of a literal token after the fact.

More generally, if you're hashing anything later than the raw grammar
tree, I think that generating a suitable representation of the queries
represented by a single hash entry is going to be problematic anyway.
There could be significant differences --- much more than just the
values of constants --- between query strings that end up being
semantically the same. Or for that matter we could have identical query
strings that wind up being considered different because of the action of
search_path or other context.

It might be that the path of least resistance is to document that we
select one of the actual query strings "at random" to represent all the
queries that went into the same hash entry, and not even bother with
trying to strip out constants. The effort required to do that seems
well out of proportion to the reward, if we can't do a perfect job of
representing the common aspects of the queries.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2011-11-06 15:57:50 Re: [PATCH] optional cleaning queries stored in pg_stat_statements
Previous Message Tom Lane 2011-11-06 14:34:24 Re: Strange problem with create table as select * from table;