Skip site navigation (1) Skip section navigation (2)

pg_stat_statements normalization: re-review

From: Daniel Farina <daniel(at)heroku(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, peter(at)2ndquadrant(dot)com
Subject: pg_stat_statements normalization: re-review
Date: 2012-02-23 09:58:44
Message-ID: CAAZKuFYZ37ox+v+izHhAPhKq_xLSYD_rPnK=6-ANVW4dCE_4sA@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-hackers
Hello again,

I'm reviewing the revised version of pg_stat_statements again.  In
particular, this new version has done away with the mechanical but
invasive change to the lexing that preserved *both* the position and
length of constant values. (See
http://archives.postgresql.org/message-id/CAEYLb_WGeFCT7MfJ8FXf-CR6BSE6Lbn+O1VX3+OGrc4Bscn4=A@mail.gmail.com)

I did the review front-matter again (check against a recent version,
make sure it does what it says it'll do, et al..) and did trigger an
assertion failure that seems to be an oversight, already reported to
Peter.  I found that oversight by running the SQLAlchemy Python
query-generation toolkit's tests, which are voluminous.  The
functionality is seemingly mostly unchanged.

What I'm going to do here is focus on the back-end source changes that
are not part of the contrib.  The size of the changes are much
reduced, but their semantics are also somewhat more complex...so, here
goes.  Conclusion first:

* The small changes to hashing are probably not strictly required,
unless collisions are known to get terrible.

* The hook to analyze is pretty straight-forward and seem like the other hooks

* The addition of a field to scribble upon in the Query and Plan nodes
is something I'd like to understand better, as these leave me with a
bit of disquiet.

What follows is in much more detail, and broken up by functionality
and file grouping in the backend.

 src/include/access/hash.h            |    4 ++-
 src/backend/access/hash/hashfunc.c   |   21 ++++++++++---------

This adjusts the hash_any procedure to support returning two possible
precisions (64 bit and 32 bit) by tacking on a Boolean flag to make
the precision selectable.  The hash_any operator is never used
directly, and instead via macro, and a second macro to support 64-bit
precision has been added.  It seems a bit wonky to me to use a flag to
select this rather than encapsulating the common logic in a procedure
and just break this up into three symbols, though.  I think the longer
precision is used to try to get fewer collisions with not too much
slowdown.  Although it may meaningfully improve the quality of the
hashing for pg_stat_statements or living without the extra hashing
bits might lead to unusable results (?), per the usual semantics of
hashing it is probably not *strictly* necessary.

A smidgen of avoidable formatting niggles (> 80 columns) are in this section.

 src/backend/nodes/copyfuncs.c        |    5 ++++
 src/backend/nodes/equalfuncs.c       |    4 +++
 src/backend/nodes/outfuncs.c         |   10 +++++++++
 src/backend/optimizer/plan/planner.c |    1 +
 src/backend/nodes/readfuncs.c        |   12 +++++++++++
 src/include/nodes/parsenodes.h       |    3 ++
 src/include/nodes/plannodes.h        |    2 +

These have all been changed in the usual manner to support one new
field, the queryId, on the toplevel Plan and Query nodes.  The queryId
is 64-bit field copied from queries to plans to tie together plans "to
be used by plugins" (quoth the source) as they flow through the
system.  pg_stat_statements fills them with the 64-bit hash of the
query text -- a reasonable functionality to possess, I think, but the
abstraction seems iffy: plugins cannot compose well if they all need
to use the queryId for different reasons.  Somehow I get the feeling
that this field can be avoided or its use case can be satisfied in a
more satisfying way.

Mysteriously, in the Query representation the symbol uses an
underscored-notation (query_id) and in the planner it uses a camelcase
one, queryId.  Overall, the other fields in those structs all use
camelcase, so I'd recommend normalizing it.

 src/backend/parser/analyze.c         |   37 ++++++++++++++++++++++++++++++---
 src/include/parser/analyze.h         |   12 +++++++++++

These changes implement hooks for the once-non-hookable symbols
parse_analyze_varparams and parse_analyze, in seemingly the same way
they are implemented in other hooks in the system.  These are neatly
symmetrical with the planner hook.  I only ask if there is a way to
have one hook and not two, but I suppose that would be a similar
question as "why is are there two ways to symbols to enter into
parsing, and not one".

-- 
fdr

Responses

pgsql-hackers by date

Next:From: Kyotaro HORIGUCHIDate: 2012-02-23 10:14:03
Subject: Re: Speed dblink using alternate libpq tuple storage
Previous:From: Simon RiggsDate: 2012-02-23 09:18:57
Subject: Re: foreign key locks, 2nd attempt

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group