Re: BUG: pg_stat_statements query normalization issues with combined queries

From: Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BUG: pg_stat_statements query normalization issues with combined queries
Date: 2016-12-30 04:33:59
Message-ID: CAMsr+YF+ZJypAzT3K+aNStOhtv875gb2nk2z1w2PJNscGe1tgQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 28 December 2016 at 20:11, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> wrote:
>
> Hello Tom,
>
>> [...] It's touching every single utility statement type, which is not only
>> pretty invasive in itself, but will break any pending patches that add more
>> utility statement types.
>
>
> Yep, but it is limited to headers and the break is trivial...

I'm not sure how else that part can be done.

We can add a common parent for all utility statement types, but that
also touches all utility statements, and since we're working with pure
C, it means 'type' won't be accessible as e.g. someCreateStmt.type,
but only as something like ((UtilityStatement)someCreateStmt).type or
someCreateStmt.hdr.type .

e.g.

typedef struct UtilityStatement
{
NodeTag type;
int querytext_offset;
int querytext_length;
};

typedef struct CreateStmt
{
UtilityStatement hdr;
RangeVar *relation; /* relation to create */
....
}

I don't think that's so bad, but it's not exactly less invasive.

>> And you've not followed through on the implications of adding those fields
>> in, for instance, src/backend/nodes/;
>
> Hmmm, probably you are pointing out structure read/write functions.

I'm sure that's his intent, yes.

> I would have hoped that such code would be automatically derived from the
> corresponding struct definition...

That would be nice. But no, unfortunately, we have no such preprocessing.

To do so would require that we either generate things like
parsenodes.h and the relevant parts of copyfuncs.c etc from a common
source, or have a toolchain that can ingest parsenodes.h and emit
suitable code. The latter means either "fragile, hideous Perl script"
or requiring something like LLVM or the gcc frontend libraries in the
toolchain to parse the C code and emit an intermediate representation
of the structures that we could then generate our functions from.

If you've come from the Java world you'd expect that we'd just
generate the copy and makenode stuff from introspection of the node
structs, or preprocess them or something, but it's not very practical.
They don't change enough to really make it worth it either.

> Hmmm. I've run into a tiny small ever so little detail in trying to apply
> this clever approach...
>
> For fixing the information in pg_stat_statement, the location data must be
> transported from the parsed node to the query to the planned node, because
> the later two nodes types are passed to different hooks.

Yep.

> Now the detail is that utility statements, which seems to be nearly all of
> them but select/update/delete/insert, do not have plans: The statement
> itself is its own plan... so there is no place to store the location &
> length.

Yeah. I don't see any way around adding location info for utility
statements one way or the other.

We could wrap every utility statement up in a container with the
location info, but that'd break every user of ProcessUtility_hook and
unless we put the container as the first by-value member of each
utility struct anyway, it'd mean extra allocations for each utility
statement. Not very appealing.

TBH I think that for utility statements, adding a member struct with
location info is the least-bad option. Something like:

/*
* Location of the original querytext of an individual parsetree or
plan relative to the
* start of the querytext. This is usually offset 0 and the length of
the querytext its self,
* but in multi-statements and other cases where a single parse has
multiple products
* hooks may need to know which part of the querytext corresponds to
which product
* plan.
*
* offset and length are both -1 for plans with no corresponding
querytext, such as
* additional/replacement queries emitted by the rewrite phase or
additional statements
* produced as a result of processing some utility statements.
*
typedef struct StmtLocation
{
int offset;
int length;
}

typedef struct CreateStmt
{
NodeTag type;
StmtLocation location;
....
}

Personally I don't much care if we do it this way, or by embedding the
NodeTag too and creating a common UtilityStmt. The latter probably has
benefits for later extension, but is a bit more annoying in the
shorter term.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2016-12-30 05:41:58 Re: Support for pg_receivexlog --format=plain|tar
Previous Message Ashutosh Bapat 2016-12-30 04:30:29 Re: Assignment of valid collation for SET operations on queries with UNKNOWN types.