pg_stat_statements normalization: re-review

From: Daniel Farina <daniel(at)heroku(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, peter(at)2ndquadrant(dot)com
Subject: pg_stat_statements normalization: re-review
Date: 2012-02-23 09:58:44
Message-ID: CAAZKuFYZ37ox+v+izHhAPhKq_xLSYD_rPnK=6-ANVW4dCE_4sA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello again,

I'm reviewing the revised version of pg_stat_statements again. In
particular, this new version has done away with the mechanical but
invasive change to the lexing that preserved *both* the position and
length of constant values. (See
http://archives.postgresql.org/message-id/CAEYLb_WGeFCT7MfJ8FXf-CR6BSE6Lbn+O1VX3+OGrc4Bscn4=A@mail.gmail.com)

I did the review front-matter again (check against a recent version,
make sure it does what it says it'll do, et al..) and did trigger an
assertion failure that seems to be an oversight, already reported to
Peter. I found that oversight by running the SQLAlchemy Python
query-generation toolkit's tests, which are voluminous. The
functionality is seemingly mostly unchanged.

What I'm going to do here is focus on the back-end source changes that
are not part of the contrib. The size of the changes are much
reduced, but their semantics are also somewhat more complex...so, here
goes. Conclusion first:

* The small changes to hashing are probably not strictly required,
unless collisions are known to get terrible.

* The hook to analyze is pretty straight-forward and seem like the other hooks

* The addition of a field to scribble upon in the Query and Plan nodes
is something I'd like to understand better, as these leave me with a
bit of disquiet.

What follows is in much more detail, and broken up by functionality
and file grouping in the backend.

src/include/access/hash.h | 4 ++-
src/backend/access/hash/hashfunc.c | 21 ++++++++++---------

This adjusts the hash_any procedure to support returning two possible
precisions (64 bit and 32 bit) by tacking on a Boolean flag to make
the precision selectable. The hash_any operator is never used
directly, and instead via macro, and a second macro to support 64-bit
precision has been added. It seems a bit wonky to me to use a flag to
select this rather than encapsulating the common logic in a procedure
and just break this up into three symbols, though. I think the longer
precision is used to try to get fewer collisions with not too much
slowdown. Although it may meaningfully improve the quality of the
hashing for pg_stat_statements or living without the extra hashing
bits might lead to unusable results (?), per the usual semantics of
hashing it is probably not *strictly* necessary.

A smidgen of avoidable formatting niggles (> 80 columns) are in this section.

src/backend/nodes/copyfuncs.c | 5 ++++
src/backend/nodes/equalfuncs.c | 4 +++
src/backend/nodes/outfuncs.c | 10 +++++++++
src/backend/optimizer/plan/planner.c | 1 +
src/backend/nodes/readfuncs.c | 12 +++++++++++
src/include/nodes/parsenodes.h | 3 ++
src/include/nodes/plannodes.h | 2 +

These have all been changed in the usual manner to support one new
field, the queryId, on the toplevel Plan and Query nodes. The queryId
is 64-bit field copied from queries to plans to tie together plans "to
be used by plugins" (quoth the source) as they flow through the
system. pg_stat_statements fills them with the 64-bit hash of the
query text -- a reasonable functionality to possess, I think, but the
abstraction seems iffy: plugins cannot compose well if they all need
to use the queryId for different reasons. Somehow I get the feeling
that this field can be avoided or its use case can be satisfied in a
more satisfying way.

Mysteriously, in the Query representation the symbol uses an
underscored-notation (query_id) and in the planner it uses a camelcase
one, queryId. Overall, the other fields in those structs all use
camelcase, so I'd recommend normalizing it.

src/backend/parser/analyze.c | 37 ++++++++++++++++++++++++++++++---
src/include/parser/analyze.h | 12 +++++++++++

These changes implement hooks for the once-non-hookable symbols
parse_analyze_varparams and parse_analyze, in seemingly the same way
they are implemented in other hooks in the system. These are neatly
symmetrical with the planner hook. I only ask if there is a way to
have one hook and not two, but I suppose that would be a similar
question as "why is are there two ways to symbols to enter into
parsing, and not one".

--
fdr

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2012-02-23 10:14:03 Re: Speed dblink using alternate libpq tuple storage
Previous Message Simon Riggs 2012-02-23 09:18:57 Re: foreign key locks, 2nd attempt