Re: Sample values for pg_stat_statements

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Vik Fearing <vik(dot)fearing(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Sample values for pg_stat_statements
Date: 2018-03-10 14:02:19
Message-ID: 3240d7fc-ae7c-f93b-825e-b4daf415b580@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I've looked at this patch today. I like the idea / intent in general, as
it helps with some investigation tasks. That being said, I have a couple
of questions/comments based on read through the patch:

1) I see you've renamed the .sql script from 1.4 to 1.6. I thought we've
abandoned that approach some time ago, and are now only doing the
upgrade scripts. That is, keep 1.4 script and then 1.4--1.5 and 1.5-1.6.
That's how other extensions are doing it now, at least - see btree_gin
for example. But maybe pg_stat_statements has to do it this way for some
reason, not sure?

2) The patch should have updated doc/src/sgml/pgstatstatements.sgml

3) Do we really need both collect_consts and collect_params? I can't
really imagine wanting to set only one of those options. In any case,
the names seem unnecessarily abbreviated - just use collect_constants
and collect_parameters.

4) The width_threshold GUC name seems rather weird. I mean, I wouldn't
use "threshold" in this context, and it's really unclear size of what is
it referring to. We do have a precedent, though, as pg_stat_activity has
track_activity_query_size, so I suggest using either parameters_size or
max_parameters_size (prefixed by "pg_stat_statements." of course).

5) I don't quite see why keeping the first set of parameter values we
happen to see would be particularly useful. For example, I'm way more
interested in values for the longest execution - why not to keep those?

6) I suggest to use the same naming style as the existing functions, so
for example CollectParams should be pgss_CollectParams (and it's missing
a comment too).

7) There are a couple of places where the code style violates project
rules, e.g. by placing {} around a single command in if-statement.

8) I see Andres mentioned possible privacy issues (not quite sure what
is 'data minimalism', mentioned by Andres). I'm not sure it's a problem,
considering it can be disabled and it's subject to the usual role check
(susperuser/role_read_all_stats). Unfortunately we can't use the same
approach as pg_stat_activity (only showing data for user's own queries).
Well, we could keep per-user samples, but that might considerably
inflate the file size.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2018-03-10 14:05:11 Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Previous Message Christos Maris 2018-03-10 13:46:52 Google Summer of Code: Potential Applicant