Re: Hash id in pg_stat_statements

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hash id in pg_stat_statements
Date: 2012-10-02 16:58:15
Message-ID: 20121002165815.GF1267@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

* Peter Geoghegan (peter(at)2ndquadrant(dot)com) wrote:
> On 1 October 2012 18:05, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > You're going to have to help me here, 'cause I don't see how there can
> > be duplicates if we include the PGSS_FILE_HEADER as part of the hash,
> > unless we're planning to keep PGSS_FILE_HEADER constant while we change
> > what the hash value is for a given query, yet that goes against the
> > assumptions that were laid out, aiui.
>
> Well, they wouldn't be duplicates if you think that the fact that one
> query was executed before some point release and another after ought
> to differentiate queries. I do not.

This would only be if we happened to change what hash was generated for
a given query during such a point release, where I share your feeling
that it aught to be quite rare. I'm not suggestion we do this for every
point release...

> By invalidate, I mean that when we go to open the saved file, if the
> header doesn't match, the file is considered corrupt, and we simply
> log that the file could not be read, before unlinking it. This would
> be necessary in the unlikely event of there being some substantive
> change in the representation of query trees in a point release. I am
> not aware of any precedent for this, though Tom said that there was
> one.

Right, and that's all I'm trying to address here- how do we provide a
value for a given query which can be relied upon by outside sources,
even in the face of a point release which changes what our internal hash
value for a given query is.

> I don't want to get too hung up on what we'd do if this problem
> actually occurred, because that isn't what this thread is about.

[...]

> I simply do not understand objections to the proposal. Have I missed something?

It was my impression that the concern is the stability of the hash value
and ensuring that tools which operate on it don't mistakenly lump two
different queries into one because they had the same hash value (caused
by a change in our hashing algorithm or input into it over time, eg a
point release). I was hoping to address that to allow this proposal to
move forward..

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-10-02 17:16:16 Re: Hash id in pg_stat_statements
Previous Message Andres Freund 2012-10-02 16:52:38 Re: xmalloc => pg_malloc