Re: Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

From: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
Date: 2012-02-27 12:26:33
Message-ID: CAEYLb_UhGcWR2iy_emRzw1etBsMrYxQM7A-QrnqABw5UwR4SLA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 27 February 2012 06:23, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I think that what Peter is on about in
> http://archives.postgresql.org/pgsql-hackers/2012-02/msg01152.php
> is the question of what location to use for the *result* of
> 'literal string'::typename, assuming that the type's input function
> doesn't complain.  Generally we consider that we should use the
> leftmost token's location for the location of any expression composed
> of more than one input token.  This is of course the same place for
> 'literal string'::typename, but not for the alternate syntaxes
> typename 'literal string' and cast('literal string' as typename).
> I'm not terribly impressed by the proposal to put in an arbitrary
> exception to that general rule for the convenience of this patch.

Now, you don't have to be. It was a mistake on my part to bring the
current user-visible behaviour into this. I don't see that there is
necessarily a tension between your position that we should blame the
leftmost token's location, and my contention that the Const "location"
field shouldn't misrepresent the location of certain Consts involved
in coercion post-analysis.

Let me put that in concrete terms. In my working copy of the patch, I
have made some more changes to the core system (mostly reverting
things that turned out to be unnecessary), but I have also made the
following change:

*** a/src/backend/parser/parse_coerce.c
--- b/src/backend/parser/parse_coerce.c
*************** coerce_type(ParseState *pstate, Node *no
*** 280,293 ****
newcon->constlen = typeLen(targetType);
newcon->constbyval = typeByVal(targetType);
newcon->constisnull = con->constisnull;
! /* Use the leftmost of the constant's and coercion's locations */
! if (location < 0)
! newcon->location = con->location;
! else if (con->location >= 0 && con->location < location)
! newcon->location = con->location;
! else
! newcon->location = location;
!
/*
* Set up to point at the constant's text if the input routine throws
* an error.
--- 280,286 ----
newcon->constlen = typeLen(targetType);
newcon->constbyval = typeByVal(targetType);
newcon->constisnull = con->constisnull;
! newcon->location = con->location;
/*
* Set up to point at the constant's text if the input routine throws
* an error.
*********************

This does not appear to have any user-visible effect on caret position
for all variations in coercion syntax, while giving me everything that
I need. I had assumed that we were relying on things being this way,
but apparently this is not the case. The system is correctly blaming
the coercion token when it finds the coercion is at fault, and the
const token when it finds the Const node at fault, just as it did
before. So this looks like a case of removing what amounts to dead
code.

> Especially not when the only reason it's needed is that Peter is
> doing the fingerprinting at what is IMO the wrong place anyway.
> If he were working on the raw grammar output it wouldn't matter
> what parse_coerce chooses to do afterwards.

Well, I believe that your reason for preferring to do it at that stage
was that we could not capture all of the system's "normalisation
smarts", like the fact that the omission of noise words isn't a
differentiator, so we might as well not have any. This was because
much of it - like the recognition of the equivalence of explicit joins
and queries with join conditions in the where clause - occurs within
the planner. We can't have it all, so we might as well not have any.
My solution here is that we be sufficiently vague about the behaviour
of normalisation that the user has no reasonable basis to count on
that kind of more advanced reduction occurring.

I did very seriously consider hashing the raw parse tree, but I have
several practical reasons for not doing so. Whatever way you look at
it, hashing there is going to result in more code, that is more ugly.
There is no uniform parent node that I can tag with a query_id. There
has to be more modifications to the core system so that queryId value
is carried around more places and persists for longer. The fact that
I'd actually be hashing different structs at different times (that
tree is accessed through a Node pointer) would necessitate lots of
redundant code that operated on each of the very similar structs in an
analogous way. The fact is that waiting until after parse analysis has
plenty of things to recommend it, and yes, the fact that we already
have working code with extensive regression tests is one of them.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2012-02-27 12:29:32 Re: overriding current_timestamp
Previous Message Heikki Linnakangas 2012-02-27 12:13:32 Re: foreign key locks, 2nd attempt