Re: Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

From: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
Date: 2012-03-13 19:02:18
Message-ID: CAEYLb_W9vWe0csaaqLedzxxPZPAS7erUfgTF7hLWvfbZ=dsn3w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2 March 2012 20:10, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I do intend to take this one up in due course

I probably should have exposed the query_id directly in the
pg_stat_statements view, perhaps as "query_hash". The idea of that
would be to advertise the potential non-uniqueness of the value - a
collision is *extremely* unlikely (as I've previously calculated), but
we cannot preclude the possibility, and as such it isn't *really*
usable as a primary key. BTW, even if there is a collision, we at
least know that there can't be a situation where one user's query
entry gets spurious statistics from the execution of some other
user's, or one database gets statistics from another, since their
corresponding oid values separately form part of the dynahash key,
alongside query_id.

The other reason why I'd like to do this is that I'd like to build on
this work for 9.3, and add a new column - plan_hash. When a new mode,
pg_stat_statements.plan_hash (or somesuch) is disabled (as it is by
default), this is always null, and we get the same 9.2 behaviour. When
it is enabled, however, all existing entries are invalidated, for a
clean slate. We then start hashing both the query tree *and* the query
plan. It's a whole lot less useful if we only hash the latter. Now,
entries within the view use the plan_hash as their key (or maybe a
composite of query_hash and plan_hash). This often results in entries
with duplicate query_hash values, as the planner generates different
plans for equivalent queries, but that doesn't matter; you can easily
write an aggregate query with a "GROUP BY query_hash" clause if that's
what you happen to want to see.

When this optional mode is enabled, at that point we'd probably also
separately instrument planning time, as recently proposed by Fujii.

Does that seem like an interesting idea?

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2012-03-13 19:07:14 Re: pg_upgrade and statistics
Previous Message Bruce Momjian 2012-03-13 18:56:46 Re: LIST OWNED BY...