Re: remaining sql/json patches

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: jian he <jian(dot)universality(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Erik Rijkers <er(at)xs4all(dot)nl>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: remaining sql/json patches
Date: 2023-12-06 10:42:29
Message-ID: 202312061042.uxymt2swy5fz@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2023-Dec-05, Amit Langote wrote:

> I've attempted to trim down the JSON_TABLE grammar (0004), but this is
> all I've managed so far. Among other things, I couldn't refactor the
> grammar to do away with the following:
>
> +%nonassoc NESTED
> +%left PATH

To recap, the reason we're arguing about this is that this creates two
new precedence classes, which are higher than everything else. Judging
by the discussios in thread [1], this is not acceptable. Without either
those new classes or the two hacks I describe below, the grammar has the
following shift/reduce conflict:

State 6220

2331 json_table_column_definition: NESTED . path_opt Sconst COLUMNS '(' json_table_column_definition_list ')'
2332 | NESTED . path_opt Sconst AS name COLUMNS '(' json_table_column_definition_list ')'
2636 unreserved_keyword: NESTED .

PATH shift, and go to state 6286

SCONST reduce using rule 2336 (path_opt)
PATH [reduce using rule 2636 (unreserved_keyword)]
$default reduce using rule 2636 (unreserved_keyword)

path_opt go to state 6287

First, while the grammar uses "NESTED path_opt" in the relevant productions, I
noticed that there's no test that uses NESTED without PATH, so if we break that
case, we won't notice. I propose we remove the PATH keyword from one of
the tests in jsonb_sqljson.sql in order to make sure the grammar
continues to work after whatever hacking we do:

diff --git a/src/test/regress/expected/jsonb_sqljson.out b/src/test/regress/expected/jsonb_sqljson.out
index 7e8ae6a696..8fd2385cdc 100644
--- a/src/test/regress/expected/jsonb_sqljson.out
+++ b/src/test/regress/expected/jsonb_sqljson.out
@@ -1548,7 +1548,7 @@ HINT: JSON_TABLE column names must be distinct from one another.
SELECT * FROM JSON_TABLE(
jsonb 'null', '$[*]' AS p0
COLUMNS (
- NESTED PATH '$' AS p1 COLUMNS (
+ NESTED '$' AS p1 COLUMNS (
NESTED PATH '$' AS p11 COLUMNS ( foo int ),
NESTED PATH '$' AS p12 COLUMNS ( bar int )
),
diff --git a/src/test/regress/sql/jsonb_sqljson.sql b/src/test/regress/sql/jsonb_sqljson.sql
index ea5db88b40..ea9b4ff8b6 100644
--- a/src/test/regress/sql/jsonb_sqljson.sql
+++ b/src/test/regress/sql/jsonb_sqljson.sql
@@ -617,7 +617,7 @@ SELECT * FROM JSON_TABLE(
SELECT * FROM JSON_TABLE(
jsonb 'null', '$[*]' AS p0
COLUMNS (
- NESTED PATH '$' AS p1 COLUMNS (
+ NESTED '$' AS p1 COLUMNS (
NESTED PATH '$' AS p11 COLUMNS ( foo int ),
NESTED PATH '$' AS p12 COLUMNS ( bar int )
),

Having done that, AFAICS there are two possible fixes for the grammar.
One is to keep the idea of assigning precedence explicitly to these
keywords, but do something less hackish -- we can put NESTED together
with UNBOUNDED, and classify PATH in the IDENT group. This requires no
further changes. This would make NESTED PATH follow the same rationale
as UNBOUNDED FOLLOWING / UNBOUNDED PRECEDING. Here's is a preliminary
patch for that (the large comment above needs to be updated.)

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index c15fcf2eb2..1493ac7580 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -887,9 +887,9 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
* json_predicate_type_constraint and json_key_uniqueness_constraint_opt
* productions (see comments there).
*/
-%nonassoc UNBOUNDED /* ideally would have same precedence as IDENT */
+%nonassoc UNBOUNDED NESTED /* ideally would have same precedence as IDENT */
%nonassoc IDENT PARTITION RANGE ROWS GROUPS PRECEDING FOLLOWING CUBE ROLLUP
- SET KEYS OBJECT_P SCALAR VALUE_P WITH WITHOUT
+ SET KEYS OBJECT_P SCALAR VALUE_P WITH WITHOUT PATH
%left Op OPERATOR /* multi-character ops and user-defined operators */
%left '+' '-'
%left '*' '/' '%'
@@ -911,8 +911,6 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
*/
%left JOIN CROSS LEFT FULL RIGHT INNER_P NATURAL

-%nonassoc NESTED
-%left PATH
%%

/*

The other thing we can do is use the two-token lookahead trick, by
declaring
%token NESTED_LA
and using the parser.c code to replace NESTED with NESTED_LA when it is
followed by PATH. This doesn't require assigning precedence to
anything. We do need to expand the two rules that have "NESTED
opt_path Sconst" to each be two rules, one for "NESTED_LA PATH Sconst"
and another for "NESTED Sconst". So the opt_path production goes away.
This preliminary patch does that. (I did not touch the ecpg grammar, but
it needs an update too.)

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index c15fcf2eb2..8e4c1d4ebe 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -817,7 +817,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
* FORMAT_LA, NULLS_LA, WITH_LA, and WITHOUT_LA are needed to make the grammar
* LALR(1).
*/
-%token FORMAT_LA NOT_LA NULLS_LA WITH_LA WITHOUT_LA
+%token FORMAT_LA NESTED_LA NOT_LA NULLS_LA WITH_LA WITHOUT_LA

/*
* The grammar likewise thinks these tokens are keywords, but they are never
@@ -911,8 +911,6 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
*/
%left JOIN CROSS LEFT FULL RIGHT INNER_P NATURAL

-%nonassoc NESTED
-%left PATH
%%

/*
@@ -16771,7 +16769,7 @@ json_table_column_definition:
n->location = @1;
$$ = (Node *) n;
}
- | NESTED path_opt Sconst
+ | NESTED_LA PATH Sconst
COLUMNS '(' json_table_column_definition_list ')'
{
JsonTableColumn *n = makeNode(JsonTableColumn);
@@ -16783,7 +16781,19 @@ json_table_column_definition:
n->location = @1;
$$ = (Node *) n;
}
- | NESTED path_opt Sconst AS name
+ | NESTED Sconst
+ COLUMNS '(' json_table_column_definition_list ')'
+ {
+ JsonTableColumn *n = makeNode(JsonTableColumn);
+
+ n->coltype = JTC_NESTED;
+ n->pathspec = $2;
+ n->pathname = NULL;
+ n->columns = $5;
+ n->location = @1;
+ $$ = (Node *) n;
+ }
+ | NESTED_LA PATH Sconst AS name
COLUMNS '(' json_table_column_definition_list ')'
{
JsonTableColumn *n = makeNode(JsonTableColumn);
@@ -16795,6 +16805,19 @@ json_table_column_definition:
n->location = @1;
$$ = (Node *) n;
}
+ | NESTED Sconst AS name
+ COLUMNS '(' json_table_column_definition_list ')'
+ {
+ JsonTableColumn *n = makeNode(JsonTableColumn);
+
+ n->coltype = JTC_NESTED;
+ n->pathspec = $2;
+ n->pathname = $4;
+ n->columns = $7;
+ n->location = @1;
+ $$ = (Node *) n;
+ }
+
;

json_table_column_path_specification_clause_opt:
@@ -16802,11 +16825,6 @@ json_table_column_path_specification_clause_opt:
| /* EMPTY */ { $$ = NULL; }
;

-path_opt:
- PATH { }
- | /* EMPTY */ { }
- ;
-
json_table_plan_clause_opt:
PLAN '(' json_table_plan ')' { $$ = $3; }
| PLAN DEFAULT '(' json_table_default_plan_choices ')'
diff --git a/src/backend/parser/parser.c b/src/backend/parser/parser.c
index e17c310cc1..e3092f2c3e 100644
--- a/src/backend/parser/parser.c
+++ b/src/backend/parser/parser.c
@@ -138,6 +138,7 @@ base_yylex(YYSTYPE *lvalp, YYLTYPE *llocp, core_yyscan_t yyscanner)
switch (cur_token)
{
case FORMAT:
+ case NESTED:
cur_token_length = 6;
break;
case NOT:
@@ -204,6 +205,16 @@ base_yylex(YYSTYPE *lvalp, YYLTYPE *llocp, core_yyscan_t yyscanner)
}
break;

+ case NESTED:
+ /* Replace NESTED by NESTED_LA if it's followed by PATH */
+ switch (next_token)
+ {
+ case PATH:
+ cur_token = NESTED_LA;
+ break;
+ }
+ break;
+
case NOT:
/* Replace NOT by NOT_LA if it's followed by BETWEEN, IN, etc */
switch (next_token)

I don't know which of the two "fixes" is less bad. Like Amit, I was not
able to find a solution to the problem by merely attaching precedences
to rules. (I did not try to mess with the precedence of
unreserved_keyword, because I'm pretty sure that would not be a good
solution even if I could make it work.)

[1] https://postgr.es/m/CADT4RqBPdbsZW7HS1jJP319TMRHs1hzUiP=iRJYR6UqgHCrgNQ@mail.gmail.com

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2023-12-06 10:47:10 Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Previous Message Amit Kapila 2023-12-06 10:19:29 Re: logical decoding and replication of sequences, take 2