Re: BUG: pg_stat_statements query normalization issues with combined queries

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, robertmhaas(at)gmail(dot)com
Cc: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BUG: pg_stat_statements query normalization issues with combined queries
Date: 2016-12-21 08:28:58
Message-ID: alpine.DEB.2.20.1612210841370.3892@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello Robert & Kyotaro,

>> I'm a little doubtful about your proposed fix. However, I haven't
>> studied the code, so I don't know what other approach might be better.

That is indeed the question!

The key point is that the parser parses several statements from a string,
but currently there is no clue about where the queries was found in the
string. Only a parser may know about what is being parsed so can generate
the location information. Now the possible solutions I see are:

- the string is split per query before parsing, but this requires at
least a lexer... it looks pretty uneffective to have another lexing
phase involved, even if an existing lexer is reused.

- the parser processes one query at a time and keep the "remaining
unparse string" in some state that can be queried to check up to
where it proceeded, but then the result must be stored somewhere
and it would make sense that it would be in the statement anyway,
just the management of the location information would be outside
the parser. Also that would add the cost of relaunching the parser,
not sure how bad or insignificant this is. This is (2) below.

- the parser still generates a List<Node> but keep track of the location
of statements doing so, somehow... propably as I outlined.

The query string information can be at some way of pointing in the initial
string, or the substring itself that could be extracted at some point.
I initially suggested the former because this is already what the parser
does for some nodes, and because it seems simpler to do so.

Extracting the string instead would suggest that the location of tokens
within this statement are relative to this string rather than the initial
one, but that involves a lot more changes and it is easier to miss
something doing so.

> That will work and doable, but the more fundamental issue here
> seems to be that post_parse_analyze_hook or other similar hooks
> are called with a Query incosistent with query_debug_string.

Sure.

> It is very conveniently used but the name seems to me to be suggesting
> that such usage is out of purpose. I'm not sure, though.
>
> A similar behavior is shown in error messages but this harms far
> less than the case of pg_stat_statements.
>
> ERROR: column "b" does not exist
> LINE 1: ...elect * from t where a = 1; select * from t where b = 2; com...
> ^

Ah, I wrote this piece of code a long time ago:-) The location is relative
to the full string, see comment above, changing that would involve much
more changes, and I'm not sure whether it is desirable.

Also, think of:

SELECT * FROM foo; DROP TABLE foo; SELECT * FROM foo;

Maybe the context to know which "SELECT * FROM foo" generates an error
should be kept.

> 1. Let all of the parse node have location in
> debug_query_string. (The original proposal)

Yep.

> 2. Let the parser return once for each statement (like psql
> parser)

I'm not sure it does... "\;" generates ";" in the output and the psql
lexer keeps on lexing.

> and corresponding query string is stored in a
> varialble other than debug_query_string.

I think that would involve many changes because of the way postgres is
written, the list is expected and returned by quite a few functions.
Moreover query rewriting may generate several queries out of one anyway,
so the list would be kept.

So I'm rather still in favor of my initial proposal, that is extend the
existing location information to statements, not only some tokens.

--
Fabien.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2016-12-21 08:30:47 Re: Creating a DSA area to provide work space for parallel execution
Previous Message Kyotaro HORIGUCHI 2016-12-21 08:23:58 Re: asynchronous execution