Re: BUG: pg_stat_statements query normalization issues with combined queries

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: coelho(at)cri(dot)ensmp(dot)fr, craig(dot)ringer(at)2ndquadrant(dot)com, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: BUG: pg_stat_statements query normalization issues with combined queries
Date: 2017-01-12 14:29:38
Message-ID: 10619.1484231378@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> writes:
> At Fri, 30 Dec 2016 15:10:42 +0100 (CET), Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> wrote in <alpine(dot)DEB(dot)2(dot)20(dot)1612301453280(dot)32017(at)lancre>
>> - I cannot use the intermediate node trick suggested by Tom because
>> it does not work for utility statements which do not have plans, so
>> the code still adds location & length, sorry.

I still really, really don't like this patch. I think it's going to
create continuing maintenance problems, because of shortcuts like this:

+#define isParseNodeTag(tag) ((T_Query <= (tag)) && ((tag) < T_A_Expr))

We've never had any code that depends on ranges of nodetag numbers, and
now is not the time to start. (The fact that you had to randomly relocate
some existing tags to make this work didn't make me any happier about it.)

>> - I still use the 'last_semicolon' lexer variable. The alternative is to
>> change rules so as not to skip empty statements, then write a loop to
>> compute the length based on successor location, and remove the empty
>> statements. It can be done, I do not think it is better, it is only
>> different and more verbose. I'll do it if required by a committer.

> I think this is doable in the way shown in this patch. But this
> seems somewhat bogus, too..

Yeah, I doubt that this technique for getting the raw locations in the
grammar+lexer works reliably. In general, Bison is a bit asynchronous
in terms of scanning, and may or may not have scanned a token ahead of
what the current production covers. So having production rules that
look at the scanner state is just dangerous. You should be relying on
the Bison locations mechanism (@N), instead. That has some gotchas of
its own with respect to locations of nonterminals, but at least they're
known --- and we could fix them if we had to.

Anyway, I decided to see what it would take to do it the way I had
in mind, which was to stick a separate RawStmt node atop each statement
parsetree. The project turned out a bit larger than I hoped :-(, but
I'm really pleased with the end result, because it makes the APIs around
lists of statements much simpler and more regular than they used to be.
I would probably propose most of the attached patch for adoption even
if we weren't interested in tracking statement boundaries.

In brief, what this does is:

* The output of raw parsing is now always a list of RawStmt nodes;
the statement-type-dependent nodes are one level down from that.
This is similar to the existing rule that the output of parse analysis
is always a list of Query nodes, even though the Query struct is mostly
empty in the CMD_UTILITY case. Furthermore, the output of pg_plan_queries
is now always a list of PlannedStmt nodes, even for utility statements.
In the case of a utility statement, "planning" just consists of wrapping
a CMD_UTILITY PlannedStmt around the utility node. Now, every list of
statements has a consistent head-node type depending on how far along
it is in processing.

* This allows us to keep statement location/length in RawStmt, Query,
and PlannedStmt nodes, and not anywhere else.

* This results in touching quite a bit of code that is concerned with
lists of statements in different formats, but in most places, it's
simple changes like replacing generic Node* pointers with specific
RawStmt* or PlannedStmt* pointers. I think that's an unalloyed good.

* To preserve the new rule that the node passed to top-level
parse_analyze() is now always a RawStmt, I made the grammar insert
RawStmts in the sub-statements of EXPLAIN, DECLARE CURSOR, etc.
This adds a bit of cruft in gram.y but avoids cruft elsewhere.

* In addition, I had to change the API of ProcessUtility() so that
what it's passed is the wrapper PlannedStmt not just the bare
utility statement. This is what allows pg_stat_statements to get at
the location info for a utility statement. This will affect all
users of ProcessUtility_hook, but the changes are pretty trivial
for those that don't care about location info --- see the changes
to contrib/sepgsql/hooks.c for an example.

* Because PlannedStmt also carries a canSetTag field, we're able to
get rid of some ad-hoc rules about how to reconstruct canSetTag for
a bare utility statement (specifically, the assumption that a utility
is canSetTag if and only if it's the only one in its list). While
I see no near-term need for relaxing that, it's nice to get rid of the
ad-hocery.

* I chose to also rearrange the post-parse-analysis representation
of DECLARE CURSOR so that it looks more like EXPLAIN, PREPARE, etc.
That is, the contained SELECT remains a child of the DeclareCursorStmt
rather than getting flipped around to be the other way. Possibly this
patch could have been made to work without that, but the existing code
had some weird rules about how the presence of a PlannedStmt where a
utility command was expected meant it was DECLARE CURSOR, and I was out
to get rid of weird rules like that one. With these changes, it's
true for both Query and PlannedStmt that utilityStmt is non-null if
and only if commandType is CMD_UTILITY. That allows simplifying a
whole lot of places that were testing both fields --- I think a few
of those were just defensive programming, but in a lot of them, it was
actually necessary to avoid confusing DECLARE CURSOR with SELECT.

So this ends up costing one extra palloc per statement, or two extra
in the case of a utility statement, but really that's quite negligible
compared to everything else that happens in processing a statement.
As against that, I think it makes the related code a lot clearer.
The sheer size of the patch is a bit more than Fabien's patch, but
what it is touching is not per-statement-type code but code that
works generically with lists of statements. So I think it's less
likely to cause problems for other patches.

regards, tom lane

Attachment Content-Type Size
stmt-list-rewrite-1.patch.gz application/x-gzip 39.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-01-12 14:33:23 Re: [PATCH] Rename pg_switch_xlog to pg_switch_wal
Previous Message Amit Kapila 2017-01-12 14:26:42 Re: parallelize queries containing subplans