Quick Links

Re: Hash id in pg_stat_statements

From:	Peter Geoghegan <peter(at)2ndquadrant(dot)com>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hash id in pg_stat_statements
Date:	2012-10-01 11:33:07
Message-ID:	CAEYLb_WPkeXHc0jRT3tZrMiyvchSKDAvGOHCn0529=9HjnTS2w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 1 October 2012 08:57, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> I know there was discussions about it earlier, and it wasn't done with
> an argument of it not being stable between releases (IIRC). I think we
> can live with that drawback, assuming of course that we document this
> properly.

Well, I'll point out once again that the argument about its stability
is invalid, because we serialise the entries to disk. If a point
release changes the representation of the query tree such that the
hash values won't match, then we have no recourse but to bump
pg_stat_statements version number, and invalidate all existing
entries. I credit our users with the intelligence to not jump to any
rash conclusions about the hash if directly exposed, such as assuming
that it has any particular degree of stability with respect to the
queries that are fingerprinted, or any degree greater than the
self-evident bare minimum.

I'm pretty sure that the "stability among point releases in the face
of potential minor changes to query tree representation" thing was
something that I imagined as a reason for the proposal being rejected,
when I tried to read between the lines of a flat rejection. Perhaps I
should have asked for clarification on that point. Now that I think
about it, I'm pretty sure that the need to bump catversion, as a
result of any change in the way dumping the query tree struct into
stored rules needs to happen, will preclude that problem in practice.

> I've now run into multiple customer installations where it would be
> very useful to have. The usecase is mainly storing snapshots of the
> pg_stat_statements output over time and analyzing those. Weird things
> happen for example when the query text is the same, but the hash is
> different (which can happen for example when a table is dropped and
> recreated). And even without that, in order to do anything useful with
> it, you end up hashing the query text anyway - so using the already
> existing hash would be easier and more useful.

Yes, these are all arguments that I'm familiar with :-) . I've always
thought of pg_stat_statements as a low-level statistical view that
people would naturally want to store snapshots of for analysis, in
much the same way as many do now with things like pg_stat_bgwriter
using tools like Munin. Who wouldn't want to know what queries were
running a half an hour ago when the database server seemed slower than
usual, for example? Such tools should naturally have access to the
same "candidate key" for entries, rather than adding a subtle
impedance mismatch by using the string. That reminds me - when are you
writing the pg_stat_statements Postgres plugin for Munin?

I was disappointed that my proposal was shot down, despite the fact
that I independently raised it on list at least twice, and pushed as
hard as I felt that I could at the time.

--
Peter Geoghegan http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

In response to

Hash id in pg_stat_statements at 2012-10-01 07:57:18 from Magnus Hagander

Responses

Re: Hash id in pg_stat_statements at 2012-10-01 16:12:41 from Stephen Frost

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Eisentraut	2012-10-01 13:02:12	Re: 64-bit API for large object
Previous Message	Heikki Linnakangas	2012-10-01 10:38:49	Re: BUG #7534: walreceiver takes long time to detect n/w breakdown