From: | Alexandra Wang <alexandra(dot)wang(dot)oss(at)gmail(dot)com> |
---|---|
To: | jian he <jian(dot)universality(at)gmail(dot)com> |
Cc: | Nikita Malakhov <hukutoc(at)gmail(dot)com>, Vik Fearing <vik(at)postgresfriends(dot)org>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Matheus Alcantara <matheusssilv97(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Nikita Glukhov <glukhov(dot)n(dot)a(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, "David E(dot) Wheeler" <david(at)justatheory(dot)com> |
Subject: | Re: SQL:2023 JSON simplified accessor support |
Date: | 2025-08-21 04:53:53 |
Message-ID: | CAK98qZ19bC=Qw9rWGOFKyX4B-fg1XQWEbV2OWAawqWC62fx79A@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Jian,
Thanks for reviewing! I’ve attached v13, which addresses your
feedback.
See my individual replies below, and let me know if you have any
questions!
On Thu, Jul 10, 2025 at 1:54 AM jian he <jian(dot)universality(at)gmail(dot)com> wrote:
> in v12-0001 and v12-0002.
> in transformIndirection
> if (!newresult)
> {
> /*
> * generic subscripting failed; falling back to function call
> or
> * field selection for a composite type.
> */
> Node *n;
> /* try to find function for field selection */
> newresult = ParseFuncOrColumn(pstate,
> list_make1(n),
> list_make1(result),
> last_srf,
> NULL,
> false,
> location);
> }
> the above comments mentioning "function call" is wrong?
> you passed NULL for (FuncCall *fn) in ParseFuncOrColumn.
> and ParseFuncOrColumn comments says
> ```If fn is null, we're dealing with column syntax not function syntax.``
>
You are right. Fixed.
On Thu, Jul 10, 2025 at 1:54 AM jian he <jian(dot)universality(at)gmail(dot)com> wrote:
> I think coerce_jsonpath_subscript can be further simplified.
> we already have message like:
> errhint("jsonb subscript must be coercible to either integer or text."),
> no need to pass the third argument a constant (INT4OID).
> also
> ``Oid targetType = UNKNOWNOID;``
> set it as InvalidOid would be better.
> attached is a minor refactoring of coerce_jsonpath_subscript
> based on (v12-0001 to v12-0004).
>
Thanks for the patch! I squashed it with v13-0004 and renamed the
function to coerce_jsonpath_subscript_to_int4_or_text() for clarity.
On Thu, Jul 10, 2025 at 1:54 AM jian he <jian(dot)universality(at)gmail(dot)com> wrote:
> after applied v12-0001 to v12-0006
> + /* emit warning conditionally to minimize duplicate warnings */
> + if (list_length(*indirection) > 0)
> + ereport(WARNING,
> + errcode(ERRCODE_WARNING),
> + errmsg("mixed usage of jsonb simplified accessor syntax and jsonb
> subscripting."),
> + errhint("use dot-notation for member access, or use non-null integer
> constants subscripting for array access."),
> + parser_errposition(pstate, warning_location));
>
> src7=# select ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb).d['1'::int8];
> WARNING: mixed usage of jsonb simplified accessor syntax and jsonb
> subscripting.
> LINE 1: ...t ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb).d['1'::int8]...
> ^
> HINT: use dot-notation for member access, or use non-null integer
> constants subscripting for array access.
> ERROR: subscript type bigint is not supported
> LINE 1: ...t ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb).d['1'::int8]...
> ^
> HINT: jsonb subscript must be coercible to either integer or text.
>
> The above example looks very bad. location printed twice, hint message
> is different.
> two messages level (ERROR, WARNING).
>
> also "or use non-null integer constants subscripting for array
> access." seems wrong?
> as you can see the below hint message saying it could be text or integer.
>
> select ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb)['1'::int8];
> ERROR: subscript type bigint is not supported
> LINE 1: ...ect ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb)['1'::int8]...
> ^
> HINT: jsonb subscript must be coercible to either integer or text.
>
> also select ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb)[NULL::int4];
> return NULL, so "use non-null integer constants" is wrong.
>
I see your confusion about the WARNING/ERROR messages. My intent was
to encourage users to use consistent syntax flavors rather than mixing
the SQL standard JSON simplified accessor with the pre-standard
PostgreSQL jsonb subscripting. In the meantime, I want to support
mixed syntax flavors as much as possible. However, your feedback makes
it clear that users don’t need that level of details. So, in v13 I’ve
removed the WARNING message and instead added a comment explaining how
we support mixed syntaxes.
Here's the relevant diff between v12 and v13, hope it helps:
--- a/src/backend/utils/adt/jsonbsubs.c
+++ b/src/backend/utils/adt/jsonbsubs.c
@@ -247,7 +247,6 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, List
**indirection, Subscripti
ListCell *lc;
Datum jsp;
int pathlen = 0;
- int warning_location = -1;
sbsref->refupperindexpr = NIL;
sbsref->reflowerindexpr = NIL;
@@ -311,10 +310,27 @@ jsonb_subscript_make_jsonpath(ParseState *pstate,
List **indirection, Subscripti
if (jpi_from == NULL)
{
/*
- * postpone emitting the warning
until the end of this
- * function
+ * Break out of the loop if the
subscript is not a
+ * non-null integer constant, so
that we can fall back to
+ * jsonb subscripting logic.
+ *
+ * This is needed to handle cases
with mixed usage of SQL
+ * standard json simplified
accessor syntax and PostgreSQL
+ * jsonb subscripting syntax, e.g:
+ *
+ * select (jb).a['b'].c from
jsonb_table;
+ *
+ * where dot-notation (.a and .c)
is the SQL standard json
+ * simplified accessor syntax, and
the ['b'] subscript is
+ * the PostgreSQL jsonb
subscripting syntax, because 'b'
+ * is not a non-null constant
integer and cannot be used
+ * for json array access.
+ *
+ * In this case, we cannot create a
JsonPath item, so we
+ * break out of the loop and let
+ * jsonb_subscript_transform()
handle this indirection as
+ * a PostgreSQL jsonb subscript.
*/
- warning_location =
exprLocation(ai->uidx);
pfree(jpi->value.array.elems);
pfree(jpi);
break;
@@ -360,14 +376,6 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, List
**indirection, Subscripti
*indirection = list_delete_first_n(*indirection, pathlen);
- /* emit warning conditionally to minimize duplicate warnings */
- if (list_length(*indirection) > 0)
- ereport(WARNING,
- errcode(ERRCODE_WARNING),
- errmsg("mixed usage of jsonb simplified
accessor syntax and jsonb subscripting."),
- errhint("use dot-notation for member
access, or use non-null integer constants subscripting for array access."),
- parser_errposition(pstate,
warning_location));
-
jsp = jsonPathFromParseResult(&jpres, 0, NULL);
On Thu, Jul 10, 2025 at 6:34 AM jian he <jian(dot)universality(at)gmail(dot)com> wrote:
> typedef struct FieldAccessorExpr
> {
> Expr xpr;
> char *fieldname; /* name of the JSONB object field
> accessed via
> * dot notation */
> Oid faecollid pg_node_attr(query_jumble_ignore);
> int location;
> } FieldAccessorExpr;
>
> first field as NodeTag should be just fine?
> I am not sure the field "location" is needed now, if it is needed, it
> should be
> type as ParseLoc.
> we should add it to src/tools/pgindent/typedefs.list
Good catch! I changed the first field from Expr to NodeTag, removed
the location field, and added the Node to typedefs.list.
On Thu, Jul 10, 2025 at 10:54 PM jian he <jian(dot)universality(at)gmail(dot)com>
wrote:
> On Thu, Jul 10, 2025 at 9:34 PM jian he <jian(dot)universality(at)gmail(dot)com>
> wrote:
> >
> > ------------------------------------------
> > in jsonb_subscript_make_jsonpath we have
> > foreach(lc, *indirection)
> > {
> > if (IsA(accessor, String))
> > ....
> > else if (IsA(accessor, A_Indices))
> > else
> > /*
> > * Unsupported node type for creating jsonpath. Instead of
> > * throwing an ERROR, break here so that we create a
> jsonpath from
> > * as many indirection elements as we can and let
> > * transformIndirection() fallback to alternative logic to
> handle
> > * the remaining indirection elements.
> > */
> > break;
> > }
> > the above ELSE branch comments look suspicious to me.
>
I kept the "else" branch, but added an Assert with updated comments.
Here's the relevant diff between v12 and v13:
In jsonb_subscript_make_jsonpath():
else
{
- * Unsupported node type for creating jsonpath. Instead of
- * throwing an ERROR, break here so that we create a jsonpath from
- * as many indirection elements as we can and let
- * transformIndirection() fallback to alternative logic to handle
- * the remaining indirection elements.
+ * Unexpected node type in indirection list. This should not
+ * happen with current grammar, but we handle it defensively by
+ * breaking out of the loop rather than crashing. In case of
+ * future grammar changes that might introduce new node types,
+ * this allows us to create a jsonpath from as many indirection
+ * elements as we can and let transformIndirection() fallback to
+ * alternative logic to handle the remaining indirection elements.
+ Assert(false); /* not reachable */
break;
}
Hope the updated comment clarifies why I kept it.
On Thu, Jul 10, 2025 at 10:54 PM jian he <jian(dot)universality(at)gmail(dot)com>
wrote:
> >
> transformIndirection->transformContainerSubscripts->jsonb_subscript_transform->jsonb_subscript_make_jsonpath
> > As you can see, transformIndirection have a long distance from
> > jsonb_subscript_make_jsonpath,
> > let transformIndirection handle remaining indirection elements seems not
> good.
>
In 0001's commit message, we are given the following promise:
Author: Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>
Date: Tue Jul 8 22:18:07 2025 -0700
Allow transformation of only a sublist of subscripts
This is a preparation step for allowing subscripting containers to
transform only a prefix of an indirection list and modify the list
in-place by removing the processed elements. Currently, all elements
are consumed, and the list is set to NIL after transformation.
In the following commit, subscripting containers will gain the
flexibility to stop transformation when encountering an unsupported
indirection and return the remaining indirections to the caller.
With this contract provided by the subscripting interface to the
container's transform function, I don't feel terribly inappropriate to
let jsonb's transform function: jsonb_subscript_transform(), to
transform indirections as much as it can, and leave the rest to
transformIndirection().
0002 is a good example that shows why the else branch is useful: it
adds support for String as a subscript node of jsonb, while String is
still unsupported in other in-core container subscripting.
For example, array_subscript_transform() has:
if (!IsA(lfirst(idx), A_Indices))
break;
The else branch in jsonb_subscript_make_jsonpath() just follows that
same convention.
> context v12-0001 to v12-0006.
> this ELSE branch comments is wrong, because
> + if (jsonb_check_jsonpath_needed(*indirection))
> + {
> + jsonb_subscript_make_jsonpath(pstate, indirection, sbsref);
> + if (sbsref->refjsonbpath)
> + return;
> + }
> in jsonb_check_jsonpath_needed we already use Assert to confirm that
> list "indirection"
> is either String or A_Indices Node.
>
The Assert I added in v13, as well as the one you mentioned in
jsonb_check_jsonpath_needed(), are both only applicable in development
build and won’t help in production. So I’d prefer to break here and
let the callee handle any unexpected Node with an ERROR.
Let me know if this explanation makes sense. If not, and if you think
the current logic is overly complicated, we can discuss further.
On Thu, Jul 10, 2025 at 10:54 PM jian he <jian(dot)universality(at)gmail(dot)com>
wrote:
> in transformContainerSubscripts we have
> sbsroutines->transform(sbsref, indirection, pstate,
> isSlice, isAssignment);
> /*
> * Error out, if datatype failed to consume any indirection elements.
> */
> if (list_length(*indirection) == indirection_length)
> {
> Node *ind = linitial(*indirection);
> if (noError)
> return NULL;
> if (IsA(ind, String))
> ereport(ERROR,
> (errcode(ERRCODE_DATATYPE_MISMATCH),
> errmsg("type %s does not support dot notation",
> format_type_be(containerType)),
> parser_errposition(pstate,
> exprLocation(containerBase))));
> else if (IsA(ind, A_Indices))
> ereport(ERROR,
> (errcode(ERRCODE_DATATYPE_MISMATCH),
> errmsg("type %s does not support array subscripting",
> format_type_be(containerType)),
> parser_errposition(pstate,
> exprLocation(containerBase))));
> else
> elog(ERROR, "invalid indirection operation: %d", nodeTag(ind));
> }
> sbsroutines->transform currently will call
> array_subscript_transform, hstore_subscript_transform,
> jsonb_subscript_transform
>
> in jsonb_subscript_transform callee we unconditionally do:
> *indirection = list_delete_first_n(*indirection, pathlen);
> *indirection = list_delete_first_n(*indirection,
> list_length(upperIndexpr));
> in array_subscript_transform, we do
> *indirection = list_delete_first_n(*indirection, ndim);
>
> That means, if sbsroutines->transform not error out and indirection is
> not NIL (which is unlikely)
> then sbsroutines->transform will consume some induction elements.
> instead of the above verbose ereport(ERROR, error handling, we can use
> Assert
> Assert(indirection_length > list_length(*indirection));
>
> for the above comments, i did a refactoring based on v12 (0001 to 0006).
Thanks for the patch! I didn’t apply it for the same reason I
mentioned in the previous quote reply. In the case of an unexpected
Node type, I prefer raising an explicit ERROR rather than using an
Assert() that crashes in a dev build but silently continues in
production. One possible way to hit these ERRORs is by creating a
custom data type that supports subscripting, but only for a subset of
Node types that transformIndirection() allows.
Best,
Alex
Attachment | Content-Type | Size |
---|---|---|
v13-0004-Extract-coerce_jsonpath_subscript.patch | application/octet-stream | 5.8 KB |
v13-0007-Implement-jsonb-wildcard-member-accessor.patch | application/octet-stream | 31.4 KB |
v13-0006-Implement-Jsonb-subscripting-with-slicing.patch | application/octet-stream | 17.4 KB |
v13-0001-Allow-transformation-of-only-a-sublist-of-subscr.patch | application/octet-stream | 9.0 KB |
v13-0005-Implement-read-only-dot-notation-for-jsonb.patch | application/octet-stream | 62.3 KB |
v13-0003-Export-jsonPathFromParseResult.patch | application/octet-stream | 2.7 KB |
v13-0002-Allow-Generic-Type-Subscripting-to-Accept-Dot-No.patch | application/octet-stream | 10.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Munro | 2025-08-21 05:07:06 | Redesigning postmaster death handling |
Previous Message | Tom Lane | 2025-08-21 04:53:09 | Re: Remove traces of long in dynahash.c |