Re: Timing events WIP v1

From: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Timing events WIP v1
Date: 2013-01-14 16:19:42
Message-ID: CAEYLb_X0=s+jFgYGTs0n3t0ga4=0qTTddub2USR+bLBgJn0Ufw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I decided to take a look at this.

On 15 November 2012 09:56, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> -I modeled the hook here on the logging one that went into 9.2. It's
> defined in its own include file and it gets initialized by the logging
> system. No strong justification for putting it there, it was just similar
> to the existing hook and that seemed reasonable. This is in some respects
> an alternate form of logging after all. The data sent by the hook itself
> needs to be more verbose in order to handle the full spec, right now I'm
> just asking about placement.

I noticed that when !log_checkpoints, control never reaches the site
where the hook is called, and thus the checkpoint info is not stored.
Is that the intended behaviour of the patch? I don't feel strongly
either way, but it did give me pause for a second. pg_stat

I don't like the design of this patch. It seems like we should be
moving as far away from an (internal) textual representation as
possible - that's basically the main reason why logging is bad.
Textual representations make easy, correct querying hard, are bloated,
and are really no better than manual logging.

The first question I'd ask is "what is true of all timing events?".
You address this in your original RFC [1]. However, this isn't
reflected in the datastructures that the patch uses, and I suspect
that it should be.

What I'm envisioning is new NodeTag infrastructure. You'd have an
"abstract base class". This is sort of like the one used by plan nodes
(i.e. the Plan struct). This would contain the information that is
essential to all timing events (in the case of Plan, this includes
startup and total costs for the node in question). You'd refactor
structs like CheckpointStatsData, to compose them such that their
first field was this parent struct (i.e. logically, they'd inherit
from, say, TimingEvent, which itself would inherit from Node, in the
style of Plan's children). That'd entail normalising things like
inconsistent use of datatypes to represent time intervals and whatnot
among these sorts of structs in the extant code, which is painful, but
long term it seems like the way forward.

With this commonality and variability analysis performed, and having
refactored the code, you'd be able to use type punning to pass the
essential information to your new hook. You'd have a way to usefully
restrict statistics to only one class of timing event, because you
could expose the NodeTag in your view (or perhaps a prettified version
thereof), by putting something in a query predicate. You can perform
aggregation, too. Maybe this abstract base class would look something
like:

typedef struct TimingEvent
{
NodeTag type;

/*
* Common structural data for all TimingEvent types.
*/
double duration; /* Duration of event in milliseconds (per project
policy for views) */
double naffected; /* Number of objects (of variable type) affected */
char *string; /* Non-standardised event string */
} TimingEvent;

However, my failure to come up with more fields here leads me to
believe that we'd be better off with a slightly different approach.
Let's forget about storing text altogether, and make the unstructured
"union" field JSON (while making fields common to all timing events
plain-old scalar datums, for ease of use).

It would become the job of PGTE_memsize() to make each entry in the
shared hashtable wide enough to accommodate the widest of all possible
TimingEvent-inheriting structs. We'd just store heterogeneous
TimingEvent-inheriting structs in the hash table. It would be
pg_timing_events job to call a JSON conversion routine directly, with
baked-in knowledge, within a NodeTag case statement.

It actually wouldn't be hard to have the view spit out some JSON from
the internal, compact struct representation. Currently, explain.c
manages to generate JSON representations of plans with very little
fuss, and without using any of the JSON datatype stuff. Doing this
with hstore is just way too controversial, and not particularly worth
it, IMHO. With a "where event_type = x" in the query predicate, the
JSON datums would have predictable, consistent structure, facilitating
machine reading and aggregation.

[1] Original RFC e-mail:
http://www.postgresql.org/message-id/509300F7.5000803@2ndQuadrant.com

--
Peter Geoghegan http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2013-01-14 16:24:52 Re: [PATCH] COPY .. COMPRESSED
Previous Message Tom Lane 2013-01-14 16:13:17 Re: [PATCH] COPY .. COMPRESSED