Re: BUG: pg_stat_statements query normalization issues with combined queries

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: coelho(at)cri(dot)ensmp(dot)fr
Cc: robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: BUG: pg_stat_statements query normalization issues with combined queries
Date: 2016-12-22 07:02:23
Message-ID: 20161222.160223.146423158.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Wed, 21 Dec 2016 09:28:58 +0100 (CET), Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> wrote in <alpine(dot)DEB(dot)2(dot)20(dot)1612210841370(dot)3892(at)lancre>
>
> Hello Robert & Kyotaro,
>
> >> I'm a little doubtful about your proposed fix. However, I haven't
> >> studied the code, so I don't know what other approach might be better.
>
> That is indeed the question!
>
> The key point is that the parser parses several statements from a
> string, but currently there is no clue about where the queries was
> found in the string. Only a parser may know about what is being parsed
> so can generate the location information. Now the possible solutions I
> see are:
>
> - the string is split per query before parsing, but this requires at
> least a lexer... it looks pretty uneffective to have another lexing
> phase involved, even if an existing lexer is reused.

I don't see this is promising. Apparently a waste of CPU cycles.

> - the parser processes one query at a time and keep the "remaining
> unparse string" in some state that can be queried to check up to
> where it proceeded, but then the result must be stored somewhere
> and it would make sense that it would be in the statement anyway,
> just the management of the location information would be outside
> the parser. Also that would add the cost of relaunching the parser,
> not sure how bad or insignificant this is. This is (2) below.

I raised this as a spoiler, I see this is somewhat too invasive
for the problem to be solved.

> - the parser still generates a List<Node> but keep track of the location
> of statements doing so, somehow... propably as I outlined.

Yeah, this seems most reasonable so far. It seems to me to be
better that the statement location is conveyed as a part of a
parse tree so as not to need need a side channel for location.

I'd like to rename debug_query_string to more reasonable name if
we are allowed but perhaps not.

> The query string information can be at some way of pointing in the
> initial
> string, or the substring itself that could be extracted at some point.
> I initially suggested the former because this is already what the
> parser does for some nodes, and because it seems simpler to do so.
>
> Extracting the string instead would suggest that the location of
> tokens
> within this statement are relative to this string rather than the
> initial one, but that involves a lot more changes and it is easier to
> miss something doing so.

Agreed that copying statement string would be too much. But the
new *Stmt node should somehow have also the end location of the
statement since the user of a parse tree cannot look into another
one.

> > That will work and doable, but the more fundamental issue here
> > seems to be that post_parse_analyze_hook or other similar hooks
> > are called with a Query incosistent with query_debug_string.
>
> Sure.
>
> > It is very conveniently used but the name seems to me to be suggesting
> > that such usage is out of purpose. I'm not sure, though.
> >
> > A similar behavior is shown in error messages but this harms far
> > less than the case of pg_stat_statements.
> >
> > ERROR: column "b" does not exist
> > LINE 1: ...elect * from t where a = 1; select * from t where b = 2;
> > com...
> > ^
>
> Ah, I wrote this piece of code a long time ago:-) The location is
> relative to the full string, see comment above, changing that would
> involve much
> more changes, and I'm not sure whether it is desirable.
>
> Also, think of:
>
> SELECT * FROM foo; DROP TABLE foo; SELECT * FROM foo;
>
> Maybe the context to know which "SELECT * FROM foo" generates an error
> should be kept.
>
> > 1. Let all of the parse node have location in
> > debug_query_string. (The original proposal)
>
> Yep.
>
> > 2. Let the parser return once for each statement (like psql
> > parser)
>
> I'm not sure it does... "\;" generates ";" in the output and the psql
> lexer keeps on lexing.
>
> > and corresponding query string is stored in a
> > varialble other than debug_query_string.
>
> I think that would involve many changes because of the way postgres is
> written, the list is expected and returned by quite a few
> functions. Moreover query rewriting may generate several queries out
> of one anyway, so the list would be kept.
>
> So I'm rather still in favor of my initial proposal, that is extend
> the existing location information to statements, not only some tokens.

I thought that it's too much to let all types of parse node have
location but grepping the gram.y with "makeNode" pursuaded me to
agree with you. After changing all *Stmt nodes, only several
types of nodes seems missing it.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mithun Cy 2016-12-22 07:04:49 Ilegal type cast in _hash_doinsert()
Previous Message Amit Kapila 2016-12-22 06:47:05 Re: Cache Hash Index meta page.