parser dilemma

From: Zoltan Boszormenyi <zb(at)cybertec(dot)at>
To: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, "Florian G(dot) Pflug" <fgp(at)phlo(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>
Subject: parser dilemma
Date: 2007-04-19 09:19:40
Message-ID: 4627342C.9050809@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Tom Lane írta:
> ...
> If anyone seriously wants to propose removing postfix ops from b_expr,
> we'd better take it up on someplace more widely read than -patches.
>
> regards, tom lane
>

OK, I take the bullet and send it to -hackers.

For everyone who don't read -patches, let me reiterate the problem

During developing my GENERATED/IDENTITY patches,
a parser problem turned up.

Currently, DEFAULT is handled as a CONSTRAINT by the parser
to be able to write DEFAULT clause and CONSTRAINT clauses
in any order. Handling GENERATED { ALWAYS | BY DEFAULT}
AS { IDENTITY | ( expression ) } syntax in the same way causes
a conflict between DEFAULT and b_expr as discovered by Tom Lane.
He proposed two solutions, quote:

> The problem comes from cases like
>
> colname coltype DEFAULT 5! GENERATED ...
>
> Since b_expr allows postfix operators, it takes one more token of
> lookahead than we have to tell if the default expression is "5!"
> or "5!GENERATED ...".
>
> There are basically two ways to fix this:
>
> 1. Collapse GENERATED ALWAYS and GENERATED BY into single tokens
> using filtered_base_yylex.
>
> 2. Stop allowing postfix operators in b_expr.
>
> I find #1 a bit icky --- not only does every case added to
> filtered_base_yylex slow down parsing a little more, but combined
> tokens create rough spots in the parser's behavior. As an example,
> both NULLS and FIRST are allegedly unreserved words, so this should
> work:
>
> regression=# create table nulls (x int);
> CREATE TABLE
> regression=# select first.* from nulls first;
> ERROR: syntax error at or near "first"
> LINE 1: select first.* from nulls first;
> ^
> regression=#
>
> #2 actually seems like a viable alternative: postfix operators aren't
> really in common use, and doing this would not only fix GENERATED but
> let us de-reserve a few keywords that are currently reserved. In a
> non-exhaustive check I found that COLLATE, DEFERRABLE, and INITIALLY
> could become unreserved_keyword if we take out this production:
>
> *** 7429,7436 ****
> { $$ = (Node *) makeA_Expr(AEXPR_OP, $2, $1, $3, @2); }
> | qual_Op b_expr %prec Op
> { $$ = (Node *) makeA_Expr(AEXPR_OP, $1, NULL, $2, @1); }
> - | b_expr qual_Op %prec POSTFIXOP
> - { $$ = (Node *) makeA_Expr(AEXPR_OP, $2, $1, NULL, @2); }
> | b_expr IS DISTINCT FROM b_expr %prec IS
> {
> $$ = (Node *) makeSimpleA_Expr(AEXPR_DISTINCT, "=", $1, $5, @2);
> --- 7550,7555 ----
>
> (Hmm, actually I'm wondering why COLLATE is a keyword at all right
> now... but the other two trace directly to the what-comes-after-DEFAULT
> issue.)

I proposed a third solution, that is actually standard-conforming
and still leaves the possibility of having postfix operators.
The solution was to admit that DEFAULT is not a CONSTRAINT,
hence not mixable with them. The standard has this syntax:

<column definition> ::=
<column name> [ <data type or domain name> ]
[ <default clause> | <identity column specification> | <generation
clause> ]
[ <column constraint definition>... ]
[ <collate clause> ]

This says that DEFAULT | GENERATED ... AS IDENTITY |
GENERATED ALWAYS AS ( expr ) must come after the data type
and before any CONSTRAINTs and the three forms are mutually exclusive.
This can be nicely handled by the parser and the analyzer phase
can save some cycles by not checking for conflicting DEFAULT clauses.

What do people think? Which would be the preferred solution?

Best regards,
Zoltán Böszörményi

--
----------------------------------
Zoltán Böszörményi
Cybertec Geschwinde & Schönig GmbH
http://www.postgresql.at/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marcin Waldowski 2007-04-19 09:43:03 BUG #3242: FATAL: could not unlock semaphore: error code 298
Previous Message ITAGAKI Takahiro 2007-04-19 06:22:32 Re: Remaining VACUUM patches

Browse pgsql-patches by date

  From Date Subject
Next Message Heikki Linnakangas 2007-04-19 11:02:19 Re: Load distributed checkpoint V4
Previous Message ITAGAKI Takahiro 2007-04-19 04:09:48 Load distributed checkpoint V4