Re: Patch: add timing of buffer I/O requests

From: "ktm(at)rice(dot)edu" <ktm(at)rice(dot)edu>
To: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Ants Aasma <ants(at)cybertec(dot)at>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Patch: add timing of buffer I/O requests
Date: 2012-04-11 13:02:44
Message-ID: 20120411130244.GT24468@aart.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 11, 2012 at 01:53:06AM +0100, Peter Geoghegan wrote:
> On 11 April 2012 01:16, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Peter Geoghegan <peter(at)2ndquadrant(dot)com> writes:
> >> On 11 April 2012 00:35, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >>> If people need something like that, couldn't they create it by hashing
> >>> the normalized query text with an arbitrary algorithm?
> >
> >> That supposes that the normalised query text is perfectly stable. It
> >> may well not be, particularly for things like ad-hoc queries or
> >> queries generated by ORMs, across database clusters and over long
> >> periods of time -
> >
> > Indeed, but the hash value isn't stable either given those sorts of
> > assumptions, so I'm not convinced that there's any advantage there.
>
> Isn't it? The hash captures the true meaning of the query, while
> having the database server's platform as a usually irrelevant
> artefact. Another thing that I forgot to mention is client encoding -
> it may well be fairly inconvenient to have to use the same algorithm
> to hash the query string across applications. You also have to hash
> the query string yourself again and again, which is expensive to do
> from Python or something, and is often inconvenient - differences
> beyond track_activity_query_size bytes (default:1024) are not
> recognised. Using an SQL code beautifier where a single byte varies
> now breaks everything, which developers don't expect at all (we've
> trained them not to), so in many ways you're back to the same
> limitations as classic pg_stat_statements if you attempt to aggregate
> queries over time and across machines, which is a very real use case.
>
> It's probably pretty annoying to have to get your Python app to use
> the same hash function as your Java app or whatever I, unless you want
> to use something heavyweight like a cryptographic hash function. By
> doing it within Postgres, you avoid those headaches.
>
> I'm not asking you to very loudly proclaim that it should be used like
> this - just expose it, accurately document it, and I'm quite confident
> that it will be widely used and relied upon by those that are
> reasonably well informed, and understand its limitations, which are
> really quite straightforward.
>
> > What I think people would actually like to know, if they're in a
> > situation where distinct query texts are getting hashed to the same
> > thing, is *which* different texts got hashed to the same thing.
> > But there's no good way to expose that given the pg_stat_statements
> > infrastructure, and exposing the hash value doesn't help.
>
> Apart from detecting the case where we get a straightforward
> collision, I don't expect that that would be useful. The whole point
> is that the user doesn't care about the difference, and I think we've
> specified a practical, widely useful standard for when queries should
> be considered equivalent.
> --
> Peter Geoghegan       http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training and Services
>

By using all 64-bits of the hash that we currently calculate, instead
of the current use of 32-bits only, the collision probabilities are
very low.

Regards,
Ken

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-04-11 14:24:20 Re: Last gasp
Previous Message Greg Smith 2012-04-11 10:04:42 Re: Last gasp