Re: remaining sql/json patches

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: jian he <jian(dot)universality(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Erik Rijkers <er(at)xs4all(dot)nl>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: remaining sql/json patches
Date: 2023-12-06 14:02:21
Message-ID: CA+HiwqET8gAhoCqNx_ueELrs4UevhRMNOYSmq=WykOu9THn=og@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks Alvaro.

On Wed, Dec 6, 2023 at 7:43 PM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> On 2023-Dec-05, Amit Langote wrote:
>
> > I've attempted to trim down the JSON_TABLE grammar (0004), but this is
> > all I've managed so far. Among other things, I couldn't refactor the
> > grammar to do away with the following:
> >
> > +%nonassoc NESTED
> > +%left PATH
>
> To recap, the reason we're arguing about this is that this creates two
> new precedence classes, which are higher than everything else. Judging
> by the discussios in thread [1], this is not acceptable. Without either
> those new classes or the two hacks I describe below, the grammar has the
> following shift/reduce conflict:
>
> State 6220
>
> 2331 json_table_column_definition: NESTED . path_opt Sconst COLUMNS '(' json_table_column_definition_list ')'
> 2332 | NESTED . path_opt Sconst AS name COLUMNS '(' json_table_column_definition_list ')'
> 2636 unreserved_keyword: NESTED .
>
> PATH shift, and go to state 6286
>
> SCONST reduce using rule 2336 (path_opt)
> PATH [reduce using rule 2636 (unreserved_keyword)]
> $default reduce using rule 2636 (unreserved_keyword)
>
> path_opt go to state 6287
>
>
>
> First, while the grammar uses "NESTED path_opt" in the relevant productions, I
> noticed that there's no test that uses NESTED without PATH, so if we break that
> case, we won't notice. I propose we remove the PATH keyword from one of
> the tests in jsonb_sqljson.sql in order to make sure the grammar
> continues to work after whatever hacking we do:
>
> diff --git a/src/test/regress/expected/jsonb_sqljson.out b/src/test/regress/expected/jsonb_sqljson.out
> index 7e8ae6a696..8fd2385cdc 100644
> --- a/src/test/regress/expected/jsonb_sqljson.out
> +++ b/src/test/regress/expected/jsonb_sqljson.out
> @@ -1548,7 +1548,7 @@ HINT: JSON_TABLE column names must be distinct from one another.
> SELECT * FROM JSON_TABLE(
> jsonb 'null', '$[*]' AS p0
> COLUMNS (
> - NESTED PATH '$' AS p1 COLUMNS (
> + NESTED '$' AS p1 COLUMNS (
> NESTED PATH '$' AS p11 COLUMNS ( foo int ),
> NESTED PATH '$' AS p12 COLUMNS ( bar int )
> ),
> diff --git a/src/test/regress/sql/jsonb_sqljson.sql b/src/test/regress/sql/jsonb_sqljson.sql
> index ea5db88b40..ea9b4ff8b6 100644
> --- a/src/test/regress/sql/jsonb_sqljson.sql
> +++ b/src/test/regress/sql/jsonb_sqljson.sql
> @@ -617,7 +617,7 @@ SELECT * FROM JSON_TABLE(
> SELECT * FROM JSON_TABLE(
> jsonb 'null', '$[*]' AS p0
> COLUMNS (
> - NESTED PATH '$' AS p1 COLUMNS (
> + NESTED '$' AS p1 COLUMNS (
> NESTED PATH '$' AS p11 COLUMNS ( foo int ),
> NESTED PATH '$' AS p12 COLUMNS ( bar int )
> ),

Fixed the test case like that in the attached.

> Having done that, AFAICS there are two possible fixes for the grammar.
> One is to keep the idea of assigning precedence explicitly to these
> keywords, but do something less hackish -- we can put NESTED together
> with UNBOUNDED, and classify PATH in the IDENT group. This requires no
> further changes. This would make NESTED PATH follow the same rationale
> as UNBOUNDED FOLLOWING / UNBOUNDED PRECEDING. Here's is a preliminary
> patch for that (the large comment above needs to be updated.)
>
> diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
> index c15fcf2eb2..1493ac7580 100644
> --- a/src/backend/parser/gram.y
> +++ b/src/backend/parser/gram.y
> @@ -887,9 +887,9 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
> * json_predicate_type_constraint and json_key_uniqueness_constraint_opt
> * productions (see comments there).
> */
> -%nonassoc UNBOUNDED /* ideally would have same precedence as IDENT */
> +%nonassoc UNBOUNDED NESTED /* ideally would have same precedence as IDENT */
> %nonassoc IDENT PARTITION RANGE ROWS GROUPS PRECEDING FOLLOWING CUBE ROLLUP
> - SET KEYS OBJECT_P SCALAR VALUE_P WITH WITHOUT
> + SET KEYS OBJECT_P SCALAR VALUE_P WITH WITHOUT PATH
> %left Op OPERATOR /* multi-character ops and user-defined operators */
> %left '+' '-'
> %left '*' '/' '%'
> @@ -911,8 +911,6 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
> */
> %left JOIN CROSS LEFT FULL RIGHT INNER_P NATURAL
>
> -%nonassoc NESTED
> -%left PATH
> %%
>
> /*
>
>
> The other thing we can do is use the two-token lookahead trick, by
> declaring
> %token NESTED_LA
> and using the parser.c code to replace NESTED with NESTED_LA when it is
> followed by PATH. This doesn't require assigning precedence to
> anything. We do need to expand the two rules that have "NESTED
> opt_path Sconst" to each be two rules, one for "NESTED_LA PATH Sconst"
> and another for "NESTED Sconst". So the opt_path production goes away.
> This preliminary patch does that. (I did not touch the ecpg grammar, but
> it needs an update too.)
>
> diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
> index c15fcf2eb2..8e4c1d4ebe 100644
> --- a/src/backend/parser/gram.y
> +++ b/src/backend/parser/gram.y
> @@ -817,7 +817,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
> * FORMAT_LA, NULLS_LA, WITH_LA, and WITHOUT_LA are needed to make the grammar
> * LALR(1).
> */
> -%token FORMAT_LA NOT_LA NULLS_LA WITH_LA WITHOUT_LA
> +%token FORMAT_LA NESTED_LA NOT_LA NULLS_LA WITH_LA WITHOUT_LA
>
> /*
> * The grammar likewise thinks these tokens are keywords, but they are never
> @@ -911,8 +911,6 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
> */
> %left JOIN CROSS LEFT FULL RIGHT INNER_P NATURAL
>
> -%nonassoc NESTED
> -%left PATH
> %%
>
> /*
> @@ -16771,7 +16769,7 @@ json_table_column_definition:
> n->location = @1;
> $$ = (Node *) n;
> }
> - | NESTED path_opt Sconst
> + | NESTED_LA PATH Sconst
> COLUMNS '(' json_table_column_definition_list ')'
> {
> JsonTableColumn *n = makeNode(JsonTableColumn);
> @@ -16783,7 +16781,19 @@ json_table_column_definition:
> n->location = @1;
> $$ = (Node *) n;
> }
> - | NESTED path_opt Sconst AS name
> + | NESTED Sconst
> + COLUMNS '(' json_table_column_definition_list ')'
> + {
> + JsonTableColumn *n = makeNode(JsonTableColumn);
> +
> + n->coltype = JTC_NESTED;
> + n->pathspec = $2;
> + n->pathname = NULL;
> + n->columns = $5;
> + n->location = @1;
> + $$ = (Node *) n;
> + }
> + | NESTED_LA PATH Sconst AS name
> COLUMNS '(' json_table_column_definition_list ')'
> {
> JsonTableColumn *n = makeNode(JsonTableColumn);
> @@ -16795,6 +16805,19 @@ json_table_column_definition:
> n->location = @1;
> $$ = (Node *) n;
> }
> + | NESTED Sconst AS name
> + COLUMNS '(' json_table_column_definition_list ')'
> + {
> + JsonTableColumn *n = makeNode(JsonTableColumn);
> +
> + n->coltype = JTC_NESTED;
> + n->pathspec = $2;
> + n->pathname = $4;
> + n->columns = $7;
> + n->location = @1;
> + $$ = (Node *) n;
> + }
> +
> ;
>
> json_table_column_path_specification_clause_opt:
> @@ -16802,11 +16825,6 @@ json_table_column_path_specification_clause_opt:
> | /* EMPTY */ { $$ = NULL; }
> ;
>
> -path_opt:
> - PATH { }
> - | /* EMPTY */ { }
> - ;
> -
> json_table_plan_clause_opt:
> PLAN '(' json_table_plan ')' { $$ = $3; }
> | PLAN DEFAULT '(' json_table_default_plan_choices ')'
> diff --git a/src/backend/parser/parser.c b/src/backend/parser/parser.c
> index e17c310cc1..e3092f2c3e 100644
> --- a/src/backend/parser/parser.c
> +++ b/src/backend/parser/parser.c
> @@ -138,6 +138,7 @@ base_yylex(YYSTYPE *lvalp, YYLTYPE *llocp, core_yyscan_t yyscanner)
> switch (cur_token)
> {
> case FORMAT:
> + case NESTED:
> cur_token_length = 6;
> break;
> case NOT:
> @@ -204,6 +205,16 @@ base_yylex(YYSTYPE *lvalp, YYLTYPE *llocp, core_yyscan_t yyscanner)
> }
> break;
>
> + case NESTED:
> + /* Replace NESTED by NESTED_LA if it's followed by PATH */
> + switch (next_token)
> + {
> + case PATH:
> + cur_token = NESTED_LA;
> + break;
> + }
> + break;
> +
> case NOT:
> /* Replace NOT by NOT_LA if it's followed by BETWEEN, IN, etc */
> switch (next_token)
>
>
> I don't know which of the two "fixes" is less bad.

I think I'm inclined toward adapting the LA-token fix (attached 0005),
because we've done that before with SQL/JSON constructors patch.
Also, if I understand the concerns that Tom mentioned at [1]
correctly, maybe we'd be better off not assigning precedence to
symbols as much as possible, so there's that too against the approach
#1.

Also I've attached 0006 to add news tests under ECPG for the SQL/JSON
query functions, which I haven't done so far but realized after you
mentioned ECPG. It also includes the ECPG variant of the LA-token
fix. I'll eventually merge it into 0003 and 0004 after expanding the
test cases some more. I do wonder what kinds of tests we normally add
to ECPG suite but not others?

Finally, I also fixed a couple of silly mistakes in 0003 around
transformJsonBehavior() and some further assorted tightening in the ON
ERROR / EMPTY expression coercion handling code.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
v28-0005-JSON_TABLE-don-t-assign-precedence-to-NESTED-PAT.patch application/octet-stream 3.5 KB
v28-0002-Add-soft-error-handling-to-populate_record_field.patch application/octet-stream 23.2 KB
v28-0006-ECPG-add-SQL-JSON-query-function-tests-and-parse.patch application/octet-stream 12.8 KB
v28-0004-JSON_TABLE.patch application/octet-stream 166.9 KB
v28-0003-SQL-JSON-query-functions.patch application/octet-stream 196.7 KB
v28-0001-Add-soft-error-handling-to-some-expression-nodes.patch application/octet-stream 9.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Junwang Zhao 2023-12-06 14:07:51 Re: Make COPY format extendable: Extract COPY TO format implementations
Previous Message Tomas Vondra 2023-12-06 13:50:45 Re: logical decoding and replication of sequences, take 2