Re: BUG: pg_stat_statements query normalization issues with combined queries

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: craig(dot)ringer(at)2ndquadrant(dot)com, robertmhaas(at)gmail(dot)com, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BUG: pg_stat_statements query normalization issues with combined queries
Date: 2016-12-27 17:16:45
Message-ID: alpine.DEB.2.20.1612271718150.4911@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello Kyotaro-san,

> In nonterminal stmtmulti, setQueryLocation is added and used to
> calcualte and store the length of every stmt's. This needs an
> extra storage in bse_yy_extra_type

The extra storage is one int.

> and introduces a bit complicated stuff. But I think raw_parser can do
> that without such extra storage and labor,

How? The issue is that stmtmulti silently skip some ';' when empty
statements are found, so I need to keep track of that to know where to
stop, using the beginning location of the next statement, which is
probably your idea, does not work.

> then gram.y gets far simpler.

> The struct member to store the location and length is named
> 'qlocation', and 'qlength' in the added ParseNode. But many nodes
> already have 'location' of exactly the same purpopse. I don't see
> a necessity to differentiate them.

Alas, the "CreateTableSpaceStmt" struct already had a "char *location"
that I did not want to change... If I use "location", then I have to
change it, why not...

Another reason for the name difference is that qlocation/qlength
convention is slightly different from location when not set: location is
set manually to -1 when the information is not available, but this
convention is not possible for statements without changing all their
allocations and initializations (there are hundreds...), so the convention
I used for qlocation/qlength is that it is not set if qlength is zero,
which is the default value set by makeNode, so no change was needed...

Changing this point would mean a *lot* of possibly error prone changes in
the parser code everywhere statements are allocated...

> Putting the two above together, the following is my suggestion
> for the parser part.
>
> - Make all parse nodes have the following two members at the
> beginning. This unifies all parse node from the view of setting
> their location. Commenting about this would be better.
>
> | NodeTag type;
> | int location;

Currently only a few parse nodes have locations, those about variables and
constants that can be designated by an error message. I added the
information to about 100 statements, but for the many remaining parse
nodes the information does not exist and is not needed, so I would prefer
to avoid adding a field both unset and unused.

> - Remove struct ParseNode

"ParseNode" is needed to access the location and length of statements,
otherwise they are only "Node", which does not have a location.

> and setQueryLocation.

The function is called twice, I wanted to avoid duplicating code.

> stmtmulti sets location in the same manner to other kind of nodes, just
> doing '= @n'. base_yyparse() doesn't calculate statement length.

But...

> - Make raw_parser caluclate statement length then store it into
> each stmt scanning the yyextra.parsetree.

... I cannot calculate the statement length cleanly because of empty
statements. If I know where the last ';' is, then the length computation
must be when the reduction occurs, hence the function called from the
stmtmulti rule.

> What do you think about this?

I think that I do not know how to compute the length without knowing where
the last ';' was, because of how empty statements are silently skipped
around the stmtmulti/stmt rules, so I think that the length computation
should stay where it is.

What I can certainly do is:

- add more comments to explain why it is done like that

What I could do with some inputs from reviewers/committers is:

- rename the "char *location" field of the create table space statement
to "directory" or something else... but what?

- change qlocation/qlength to location/length...

However, then the -1 convention for location not set is not true, which
is annoying. Using this convention means hundreds of changes because every
statement allocation & initialization must be changed. This is
obviously possible, but is a much larger patch, which I cannot say
would be much better than this one, it would just be different.

What I'm reserved about:

- adding a location field to nodes that do not need it.

--
Fabien.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2016-12-27 17:17:36 Re: pg_stat_activity.waiting_start
Previous Message Claudio Freire 2016-12-27 17:14:39 Re: Vacuum: allow usage of more than 1GB of work mem