Re: pg_stat_statements normalization: re-review

From: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
To: Daniel Farina <daniel(at)heroku(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_stat_statements normalization: re-review
Date: 2012-02-23 11:09:18
Message-ID: CAEYLb_WjOsGdbhbHfzuT38Vmx0e=8Wwzx85W+n9S1qeFYZim5Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 23 February 2012 09:58, Daniel Farina <daniel(at)heroku(dot)com> wrote:
> What I'm going to do here is focus on the back-end source changes that
> are not part of the contrib.  The size of the changes are much
> reduced, but their semantics are also somewhat more complex...so, here
> goes.  Conclusion first:
>
> * The small changes to hashing are probably not strictly required,
> unless collisions are known to get terrible.

I imagine that collisions would be rather difficult to demonstrate at
all with a 32-bit value.

> * The addition of a field to scribble upon in the Query and Plan nodes
> is something I'd like to understand better, as these leave me with a
> bit of disquiet.
>
> What follows is in much more detail, and broken up by functionality
> and file grouping in the backend.
>
>  src/include/access/hash.h            |    4 ++-
>  src/backend/access/hash/hashfunc.c   |   21 ++++++++++---------
>
> This adjusts the hash_any procedure to support returning two possible
> precisions (64 bit and 32 bit) by tacking on a Boolean flag to make
> the precision selectable.  The hash_any operator is never used
> directly, and instead via macro, and a second macro to support 64-bit
> precision has been added.  It seems a bit wonky to me to use a flag to
> select this rather than encapsulating the common logic in a procedure
> and just break this up into three symbols, though.  I think the longer
> precision is used to try to get fewer collisions with not too much
> slowdown.  Although it may meaningfully improve the quality of the
> hashing for pg_stat_statements or living without the extra hashing
> bits might lead to unusable results (?), per the usual semantics of
> hashing it is probably not *strictly* necessary.

It might be unnecessary. It's difficult to know where to draw the line.

> A smidgen of avoidable formatting niggles (> 80 columns) are in this section.
>
>  src/backend/nodes/copyfuncs.c        |    5 ++++
>  src/backend/nodes/equalfuncs.c       |    4 +++
>  src/backend/nodes/outfuncs.c         |   10 +++++++++
>  src/backend/optimizer/plan/planner.c |    1 +
>  src/backend/nodes/readfuncs.c        |   12 +++++++++++
>  src/include/nodes/parsenodes.h       |    3 ++
>  src/include/nodes/plannodes.h        |    2 +

I'll look into those.

> These have all been changed in the usual manner to support one new
> field, the queryId, on the toplevel Plan and Query nodes.  The queryId
> is 64-bit field copied from queries to plans to tie together plans "to
> be used by plugins" (quoth the source) as they flow through the
> system.  pg_stat_statements fills them with the 64-bit hash of the
> query text -- a reasonable functionality to possess, I think, but the
> abstraction seems iffy: plugins cannot compose well if they all need
> to use the queryId for different reasons.  Somehow I get the feeling
> that this field can be avoided or its use case can be satisfied in a
> more satisfying way.

Maybe the answer here is to have a simple mechanism for modules to
claim the exclusive right to be able to set that flag?

> Mysteriously, in the Query representation the symbol uses an
> underscored-notation (query_id) and in the planner it uses a camelcase
> one, queryId.  Overall, the other fields in those structs all use
> camelcase, so I'd recommend normalizing it.

Fair enough.

There is one other piece of infrastructure that I think is going to
need to go into core that was not in the most recent revision. I
believe that it can be justified independently of this work.
Basically, there are some very specific scenarios in which the
location of Const nodes is incorrect, because the core system makes
that location the same location as another coercion-related node. Now,
this actually doesn't matter to the positioning of the caret in most
error messages, as they usually ought to be the same anyway, as in
this scenario:

select 'foo'::integer;

Here, both the Const and coercion node's location start from the
SCONST token - no problem there.

However, consider this query:

select cast('foo' as integer);

and this query:

select integer 'foo';

Now, the location of the Const and Coercion nodes is *still* the same,
but starting from the location of the "cast" in the first example and
"integer" in the second.

I believe that this is a bug. They should have different locations
here, so that I can always rely on the location of a Const having a
single token that is one of only a handful of possible CONST tokens,
as I currently can in all other scenarios.

Of course, this didn't particularly matter before now, as the Const
location field only dictated caret position in messages generated from
outside the parser. Any error message should be blaming the Coercion
node if there's a problem with coercion, so there's no reason why this
needs to break any error context messages. If a message wants to blame
a Const on something, surely it should have a caret placed in the
right location - the location of the const token. If it wants to blame
the Coercion node on something, that won't change, so the context
message will be the same as before.

It's my intention that the next revision, which I'll get out in the
next couple of days, will have these additional changes to the core
system, and will be able to run the sqlalchemy test suite without that
assertion failure - I might get around to running a variety of tests
from different popular frameworks/ORMs.

I had hoped that some committer would agree that this was a useful
change even without this patch, and go ahead and commit the necessary
changes, but I can appreciate that everyone is quite busy at this time
of the cycle and so I'm not relying on that happening.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kohei KaiGai 2012-02-23 11:17:14 Re: [v9.2] Add GUC sepgsql.client_label
Previous Message Peter Eisentraut 2012-02-23 11:08:51 overriding current_timestamp