Re: Rethinking plpgsql's assignment implementation

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Chapman Flack <chap(at)anastigmatix(dot)net>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Rethinking plpgsql's assignment implementation
Date: 2020-12-12 03:16:26
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> In any case, that approach still involves inserting some query text
> that the user didn't write, so I'm not sure how much confusion it'd
> remove. The "SET n:" business at least looks like it's some weird
> prefix comparable to "LINE n:", so while people wouldn't understand
> it I think they'd easily see it as something the system prefixed
> to their query.

> Looking a bit ahead, it's not too hard to imagine plpgsql wishing
> to pass other sorts of annotations through SPI and down to the core
> parser. Maybe we should think about a more general way to do that
> in an out-of-band, not-visible-in-the-query-text fashion.

I have an idea (no code written yet) about this.

After looking around, it seems like the ParserSetupHook mechanism
is plenty for anything we might want an extension to be able to
change in the behavior of parse analysis. The hooks that we
currently allow that to set affect only the interpretation of
variable names and $N parameter symbols, but we could surely
add much more in that line as needed.

What we lack is any good way for an extension to control the
behavior of raw_parser() (i.e., gram.y). Currently, plpgsql
prefixes "SELECT " to expressions it might want to parse, and
now my current patch proposes to prefix something else to get a
different grammar behavior. Another example of a very similar
problem is typeStringToTypeName(), which prefixes a string it
expects to be a type name with "SELECT NULL::", and then has
to do a bunch of kluges to deal with the underspecification
involved in that. Based on these examples, we need some sort
of "overall goal" option for the raw parser, but maybe not more
than that --- other things you might want tend to fall into the
parse analysis side of things.

So my idea here is to add a parsing-mode option to raw_parser(),
which would be an enum with values like "normal SQL statement",
"expression only", "type name", "plpgsql assignment statement".
The problem I had with not knowing how many dotted names to
absorb at the start of an assignment statement could be finessed
by inventing "assignment1", "assignment2", and "assignment3"
parsing modes; that's a little bit ugly but not enough to make
me think we need a wider API.

As to how it could actually work, I'm noticing that raw_parser
starts out by initializing yyextra's lookahead buffer to empty.
For the parsing modes other than "normal SQL statement", it
could instead inject a lookahead token that is a code that cannot
be generated by the regular lexer. Then gram.y could have
productions like

EXPRESSION_MODE a_expr { ... generate parse tree ... }

where EXPRESSION_MODE is one of these special tokens. And now
we have something that will parse an a_expr, and only an a_expr,
and we don't need any special "SELECT " or any other prefix in
the user-visible source string. Similarly for the other special
parsing modes.

Essentially, this is a way of having a few distinct parsers
that share a common base of productions, without the bloat and
code maintenance issues of building actually-distinct parsers.

A small problem with this is that the images of these special
productions in ECPG would be dead code so far as ECPG is concerned.
For the use-cases I can foresee, there wouldn't be enough special
productions for that to be a deal-breaker. But we could probably
teach the ECPG grammar-building scripts to filter out these
productions if it ever got to be annoying.

regards, tom lane

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2020-12-12 04:27:11 Re: pg_basebackup test coverage
Previous Message Bossart, Nathan 2020-12-12 00:41:25 Re: please update ps display for recovery checkpoint