Re: pg_stat_statements: Query normalisation may fail during stats reset

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Renner <michael(dot)renner(at)amd(dot)co(dot)at>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_stat_statements: Query normalisation may fail during stats reset
Date: 2014-05-06 19:15:36
Message-ID: CAM3SWZSgxs-CUOydCwvM64oCOXKjSieEMQTrfCX4zoUO0pw5yw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 6, 2014 at 11:31 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> The source code says that "query strings are normalized on a best effort
> basis", so perhaps we ought to say the same in the documentation.

Perhaps.

> It would be rather expensive to provide a guarantee of normalization:
> basically, we'd have to compute the normalized query string during parsing
> *even when the hashtable entry already exists*, and then store it
> somewhere where it'd survive till ExecutorEnd (but, preferably, not be
> leaked if we never get to ExecutorEnd; which makes this hard). I think
> most people would find that a bad tradeoff.

I certainly would.

> One cheap-and-dirty solution is to throw away the execution stats if we
> get to the end and find the hash table entry no longer exists, rather than
> make a new entry with a not-normalized string. Not sure if that cure is
> better than the disease or not.

I am certain that it is. Consider long running queries that don't
manage to get the benefit of the "aggressive decay for stick entries"
technique, because there is consistent contention.

> Another thought, though it's not too relevant to this particular scenario
> of intentional resets, is that we could raise the priority of entries
> for statements-in-progress even further. I notice for example that if
> entry_alloc finds an existing hashtable entry, it does nothing to raise
> the usage count of that entry.

To do otherwise would create an artificial prejudice against prepared
queries, though.

>>> This is a bit counterintuitive if you rely on the query to be normalised,
>>> e.g. for privacy reasons where you don’t want to leak query constants like
>>> password hashes or usernames.
>
> The bigger picture here is that relying on query normalization for privacy
> doesn't seem like a bright idea. Consider making sure that
> security-relevant values are passed as parameters rather than being
> embedded in the query text in the first place.

I agree.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2014-05-06 19:30:15 Wanted: jsonb on-disk representation documentation
Previous Message Tom Lane 2014-05-06 19:12:07 Re: [COMMITTERS] pgsql: pgindent run for 9.4