Re: pg_stat_statements normalization: re-review

From: Daniel Farina <daniel(at)heroku(dot)com>
To: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_stat_statements normalization: re-review
Date: 2012-02-24 11:14:50
Message-ID: CAAZKuFYws7m9sLBh7MxCUH6RT6QWOdcPeEHh3bgvgG4RkHqs-g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 23, 2012 at 3:09 AM, Peter Geoghegan <peter(at)2ndquadrant(dot)com> wrote:
>> These have all been changed in the usual manner to support one new
>> field, the queryId, on the toplevel Plan and Query nodes.  The queryId
>> is 64-bit field copied from queries to plans to tie together plans "to
>> be used by plugins" (quoth the source) as they flow through the
>> system.  pg_stat_statements fills them with the 64-bit hash of the
>> query text -- a reasonable functionality to possess, I think, but the
>> abstraction seems iffy: plugins cannot compose well if they all need
>> to use the queryId for different reasons.  Somehow I get the feeling
>> that this field can be avoided or its use case can be satisfied in a
>> more satisfying way.
>
> Maybe the answer here is to have a simple mechanism for modules to
> claim the exclusive right to be able to set that flag?

As I see it, the queryId as the most contentious internals change in
the patch, as queryId is not really an opaque id tracking a query's
progression, but rather a bit of extra space for query jumbles, and
query jumbles only, as far as I can tell: thus, not really a generally
useful backend service, but rather specifically for pg_stat_statements
-- that may not take such a special field off the table (more
opinionated people will probably comment on that), but it's a pretty
awkward kind of inclusion.

I'm posting an experimental patch in having the backend define state
whose only guarantee is to be equal between a PlannedStmt and a Query
node that derived it, and its effect on pg_stat_statements. (the
latter not included in this email, but available via git[0]) The
exact mechanism I use is pretty questionable, but my skimming of the
entry points of the code suggests it might work: I use the pointer to
the Query node from the PlannedStmt, which is only unique as long as
the Query (head) node remains allocated while the PlannedStmt is also
alive. A big if, but able to be verified with assertions in debug
mode ("does the pointer see poisoned memory?") and low overhead
presuming one had to allocate memory already. However, the
abstraction is as such that if the key generation needs to change
compliant consumers should be fine.

At ExecutorFinish (already hookable) all NodeKeys remembered by an
extension should be invalidated, as that memory is free and ready to
be used again.

As Query node availability from the PlannedStmt is not part of the
node for a reason, it is rendered as an opaque uintptr_t, and hidden
behind a type that only is intended to define equality and
"definedness" (!= NULL is the implementation).

The clear penalty is that the extension must be able to map from the
arbitrary, backend-assigned queryId (which I have renamed nodeKey just
to make it easier to discuss) to its own internal value it cares
about: the query jumble. I used a very simple, fixed-size association
array with a small free-space location optimization, taking advantage
of the property in pg_stat_statements can simply not process a query
if it doesn't want to (but should process most of them so one can get
a good sense of what is going on).

A credible(?) alternative I have thought of is to instead expose a
memory context for use by extensions on the interesting nodes; in
doing so extensions can request space and store whatever they need.
The part that I feel might be tricky is figuring out a reasonable and
appropriate time to free that 'extension-context' and what context
that context should live under. But I'd be wiling to try it with
haste if it sounds a lot more credible or useful than what I've got.

--
fdr

[0]: https://github.com/fdr/postgres/tree/wrench-pgss

Attachment Content-Type Size
Introduce-NodeKey-as-a-service-to-extensions-in-the-backend-v1.patch text/x-patch 6.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marko Kreen 2012-02-24 12:11:45 Re: Speed dblink using alternate libpq tuple storage
Previous Message Kyotaro HORIGUCHI 2012-02-24 10:53:14 Re: Speed dblink using alternate libpq tuple storage