Re: SQL:2023 JSON simplified accessor support

From: Alexandra Wang <alexandra(dot)wang(dot)oss(at)gmail(dot)com>
To: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
Cc: Nikita Glukhov <glukhov(dot)n(dot)a(at)gmail(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com>, Nikita Malakhov <hukutoc(at)gmail(dot)com>, Vik Fearing <vik(at)postgresfriends(dot)org>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Matheus Alcantara <matheusssilv97(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>
Subject: Re: SQL:2023 JSON simplified accessor support
Date: 2025-09-03 02:20:53
Message-ID: CAK98qZ0SUBjETvUT31U01pSKYCuQPznCtb2kf3-jB5+Dh2V03w@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Chao,

Continue from my previous email.

On Mon, Sep 1, 2025 at 8:32 PM Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:

>
>
> On Sep 2, 2025, at 10:59, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
>
> jsonbsubs doesn’t need to check is_slice flag as well, I will explain that
> in my next email tougher with my new comments.
>
>
>
> Now jsonb_check_json_needed() loops over all indirections:
>
> static bool
> jsonb_check_jsonpath_needed(List *indirection)
> {
> ListCell *lc;
>
> foreach(lc, indirection)
> {
> Node *accessor = lfirst(lc);
>
> if (IsA(accessor, String))
> return true;
> else
> {
> Assert(IsA(accessor, A_Indices));
>
> if (castNode(A_Indices, accessor)->is_slice)
> return true;
> }
> }
>
> return false;
> }
>
> Let’s consider this statement: select data[‘a’][‘b’].c from jsonb_table.
>
> For this indirection list, first two elements are non-slice indices, and
> the third is a string accessor. So json_check_jsonpath_needed() will return
> true.
>
> Then:
>
> static void
> jsonb_subscript_transform(SubscriptingRef *sbsref,
> List **indirection,
> ParseState *pstate,
> bool isSlice,
> bool isAssignment)
> {
> List *upperIndexpr = NIL;
> ListCell *idx;
>
> /* Determine the result type of the subscripting operation; always jsonb */
> sbsref->refrestype = JSONBOID;
> sbsref->reftypmod = -1;
>
> if (isSlice || jsonb_check_jsonpath_needed(*indirection))
> {
> jsonb_subscript_make_jsonpath(pstate, indirection, sbsref);
> if (sbsref->refjsonbpath)
> return;
> }
>
> It will fall into the call to json_subscript_make_jsonpath(), and it
> cannot process [‘a’], so sbsref->refjsonbpath will be NULL, then it will
> fall through down to the logic of processing [‘a’].
>
> That’s actually a waste of looping over the entire indirection list and
> making an unnecessary call to jsonb_subscript_make_jsonpath().
>
> In json_check_jsonpath_needed(), we can only check the first item. Also,
> as jsonb_check_jsonpath_needed() already has the logic of checking
> is_slice, here in jsonb_subscript_transform(), we don’t need to check
> “isSlice” in the “if” condition at all.
>
> I made the change locally, and “make check” passed.
>
> My dirty change is like:
> ```
> @@ -118,11 +118,11 @@ coerce_jsonpath_subscript_to_int4_or_text(ParseState
> *pstate, Node *subExpr)
> static bool
> jsonb_check_jsonpath_needed(List *indirection)
> {
> - ListCell *lc;
> + //ListCell *lc;
>
> - foreach(lc, indirection)
> - {
> - Node *accessor = lfirst(lc);
> + //foreach(lc, indirection)
> + //{
> + Node *accessor = lfirst(list_head(indirection));
>
> if (IsA(accessor, String) || IsA(accessor, A_Star))
> return true;
> @@ -133,7 +133,8 @@ jsonb_check_jsonpath_needed(List *indirection)
> if (castNode(A_Indices, accessor)->is_slice)
> return true;
> }
> - }
> + //break;
> + //}
>
> return false;
> }
>
> @@ -401,7 +435,7 @@ jsonb_subscript_transform(SubscriptingRef *sbsref,
> sbsref->refrestype = JSONBOID;
> sbsref->reftypmod = -1;
>
> - if (isSlice || jsonb_check_jsonpath_needed(*indirection))
> + if (jsonb_check_jsonpath_needed(*indirection))
> {
> jsonb_subscript_make_jsonpath(pstate, indirection, sbsref);
> if (sbsref->refjsonbpath)
> ```
>

This change would give an incorrect result for an accessor like
"[0].a" when the jsonb column data is a jsonb object instead of a
jsonb array. I've added two new test cases to cover this scenario:

In jsonb.sql, the newly added tests are:
select (jb)[0].a from test_jsonb_dot_notation; -- returns same result as
(jb).a
select (jb)[1].a from test_jsonb_dot_notation; -- returns NULL

In v5, I included isSlice as an argument of
jsonb_check_jsonpath_needed(), so we no longer need to check the
is_slice field of each indirection element. That should make the check
more efficient. In jsonb_subscript_make_jsonpath(), I also moved the
check for cases like ['a'] earlier, as you suggested. These changes
should eliminate some unnecessary comparisons and allocate then
immediately free.

I also want to point out that we should not recommend users mix the
standard simplified accessor syntax with the pre-standard jsonb
subscripting, because:

1. It is confusing. I would assume that a user who uses dot-notation
understands its definition. As we discussed earlier, (jb).a and
(jb)['a'] can return different results: the former is in "lax" mode
and with a conditional array wrapper, while the latter has neither. If
a user queries something like (jb)['a'].b['c'], I think most likely
she wants either (jb)['a']['b']['c'] or (jb).a.b.c.

2. As you said, it hurts performance. When a pre-standard non-integer
subscript appears before a dot-notation, we bail out of creating a
jsonpath. Alternatively, if a user really wants (jb)['a'].b['c'], it
could be explicitly written as (((jb)['a']).b)['c']. This avoids
unnecessary looping and bailout.

In my earlier revisions (up to v12), I actually emitted warnings when
users mixed syntaxes. Starting from v13, I removed the warning based
on Jian’s feedback. Here's the context:

On Wed, Aug 20, 2025 at 9:53 PM Alexandra Wang <alexandra(dot)wang(dot)oss(at)gmail(dot)com>
wrote:

> On Thu, Jul 10, 2025 at 1:54 AM jian he <jian(dot)universality(at)gmail(dot)com>
> wrote:
>
>> after applied v12-0001 to v12-0006
>> + /* emit warning conditionally to minimize duplicate warnings */
>> + if (list_length(*indirection) > 0)
>> + ereport(WARNING,
>> + errcode(ERRCODE_WARNING),
>> + errmsg("mixed usage of jsonb simplified accessor syntax and jsonb
>> subscripting."),
>> + errhint("use dot-notation for member access, or use non-null integer
>> constants subscripting for array access."),
>> + parser_errposition(pstate, warning_location));
>>
>> src7=# select ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb).d['1'::int8];
>> WARNING: mixed usage of jsonb simplified accessor syntax and jsonb
>> subscripting.
>> LINE 1: ...t ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb).d['1'::int8]...
>> ^
>> HINT: use dot-notation for member access, or use non-null integer
>> constants subscripting for array access.
>> ERROR: subscript type bigint is not supported
>> LINE 1: ...t ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb).d['1'::int8]...
>> ^
>> HINT: jsonb subscript must be coercible to either integer or text.
>>
>> The above example looks very bad. location printed twice, hint message
>> is different.
>> two messages level (ERROR, WARNING).
>> ... omitting some irelavent comments...
>
>

I see your confusion about the WARNING/ERROR messages. My intent was
> to encourage users to use consistent syntax flavors rather than mixing
> the SQL standard JSON simplified accessor with the pre-standard
> PostgreSQL jsonb subscripting. In the meantime, I want to support
> mixed syntax flavors as much as possible. However, your feedback makes
> it clear that users don’t need that level of details. So, in v13 I’ve
> removed the WARNING message and instead added a comment explaining how
> we support mixed syntaxes.
>
> Here's the relevant diff between v12 and v13, hope it helps:
>
> --- a/src/backend/utils/adt/jsonbsubs.c
> +++ b/src/backend/utils/adt/jsonbsubs.c
> @@ -247,7 +247,6 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, List
> **indirection, Subscripti
> ListCell *lc;
> Datum jsp;
> int pathlen = 0;
> - int warning_location = -1;
>
> sbsref->refupperindexpr = NIL;
> sbsref->reflowerindexpr = NIL;
> @@ -311,10 +310,27 @@ jsonb_subscript_make_jsonpath(ParseState *pstate,
> List **indirection, Subscripti
> if (jpi_from == NULL)
> {
> /*
> - * postpone emitting the warning
> until the end of this
> - * function
> + * Break out of the loop if the
> subscript is not a
> + * non-null integer constant, so
> that we can fall back to
> + * jsonb subscripting logic.
> + *
> + * This is needed to handle cases
> with mixed usage of SQL
> + * standard json simplified
> accessor syntax and PostgreSQL
> + * jsonb subscripting syntax, e.g:
> + *
> + * select (jb).a['b'].c from
> jsonb_table;
> + *
> + * where dot-notation (.a and .c)
> is the SQL standard json
> + * simplified accessor syntax, and
> the ['b'] subscript is
> + * the PostgreSQL jsonb
> subscripting syntax, because 'b'
> + * is not a non-null constant
> integer and cannot be used
> + * for json array access.
> + *
> + * In this case, we cannot create
> a JsonPath item, so we
> + * break out of the loop and let
> + * jsonb_subscript_transform()
> handle this indirection as
> + * a PostgreSQL jsonb subscript.
> */
> - warning_location =
> exprLocation(ai->uidx);
> pfree(jpi->value.array.elems);
> pfree(jpi);
> break;
> @@ -360,14 +376,6 @@ jsonb_subscript_make_jsonpath(ParseState *pstate,
> List **indirection, Subscripti
>
> *indirection = list_delete_first_n(*indirection, pathlen);
>
> - /* emit warning conditionally to minimize duplicate warnings */
> - if (list_length(*indirection) > 0)
> - ereport(WARNING,
> - errcode(ERRCODE_WARNING),
> - errmsg("mixed usage of jsonb simplified
> accessor syntax and jsonb subscripting."),
> - errhint("use dot-notation for member
> access, or use non-null integer constants subscripting for array access."),
> - parser_errposition(pstate,
> warning_location));
> -
> jsp = jsonPathFromParseResult(&jpres, 0, NULL);
>

I think we should definitely add detailed documentation about mixed
syntax. However, I'm not convinced it's worth putting more effort into
optimizing performance for those cases. Alternatively, we could fail
outright in those cases, but I don't think that would be the best user
experience. Thoughts?

On Mon, Sep 1, 2025 at 8:32 PM Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:

> The other comment is that jsonb_subscript_make_jsonpath() can be optimized
> a little bit. When indices and dot notations are mixed, it will always hit
> the clause “if (api_from == NULL)”. In this case, memory of jpi has been
> allocated, and with the “if” we need to free the memory. So, we can pull up
> calculating jpi_from, if it is NULL, then we don’t need to allocate and
> free memory. My dirty change is like:
>
> ```
> else if (IsA(accessor, A_Indices))
> {
> + JsonPathParseItem *jpi_from = NULL;
> A_Indices *ai = castNode(A_Indices, accessor);
>
> + if (!ai->is_slice)
> + {
> + Assert(ai->uidx && !ai->lidx);
> + jpi_from = make_jsonpath_item_expr(pstate,
> ai->uidx, &sbsref->refupperindexpr, true);
> + if (jpi_from == NULL)
> + {
> + /*
> + * Break out of the loop if the
> subscript is not a
> + * non-null integer constant, so
> that we can fall back to
> + * jsonb subscripting logic.
> + *
> + * This is needed to handle cases
> with mixed usage of SQL
> + * standard json simplified
> accessor syntax and PostgreSQL
> + * jsonb subscripting syntax, e.g:
> + *
> + * select (jb).a['b'].c from
> jsonb_table;
> + *
> + * where dot-notation (.a and .c)
> is the SQL standard json
> + * simplified accessor syntax, and
> the ['b'] subscript is
> + * the PostgreSQL jsonb
> subscripting syntax, because 'b'
> + * is not a non-null constant
> integer and cannot be used
> + * for json array access.
> + *
> + * In this case, we cannot create
> a JsonPath item, so we
> + * break out of the loop and let
> + * jsonb_subscript_transform()
> handle this indirection as
> + * a PostgreSQL jsonb subscript.
> + */
> + break;
> + }
> + }
> +
> jpi = make_jsonpath_item(jpiIndexArray);
> jpi->value.array.nelems = 1;
> jpi->value.array.elems =
> palloc(sizeof(jpi->value.array.elems[0]));
> @@ -303,12 +337,12 @@ jsonb_subscript_make_jsonpath(ParseState *pstate,
> List **indirection, Subscripti
> }
> else
> {
> - JsonPathParseItem *jpi_from = NULL;
> + //JsonPathParseItem *jpi_from = NULL;
>
> - Assert(ai->uidx && !ai->lidx);
> - jpi_from = make_jsonpath_item_expr(pstate,
> ai->uidx, &sbsref->refupperindexpr, true);
> - if (jpi_from == NULL)
> - {
> + //Assert(ai->uidx && !ai->lidx);
> + //jpi_from =
> make_jsonpath_item_expr(pstate, ai->uidx, &sbsref->refupperindexpr, true);
> + //if (jpi_from == NULL)
> + //{
> /*
> * Break out of the loop if the
> subscript is not a
> * non-null integer constant, so
> that we can fall back to
> @@ -331,10 +365,10 @@ jsonb_subscript_make_jsonpath(ParseState *pstate,
> List **indirection, Subscripti
> * jsonb_subscript_transform()
> handle this indirection as
> * a PostgreSQL jsonb subscript.
> */
> - pfree(jpi->value.array.elems);
> - pfree(jpi);
> - break;
> - }
> + // pfree(jpi->value.array.elems);
> + // pfree(jpi);
> + // break;
> + //}
>
> jpi->value.array.elems[0].from = jpi_from;
> jpi->value.array.elems[0].to = NULL;
> ```
>
> With this change, “make check” also passed.
>

Fixed. Thank you!

On Mon, Sep 1, 2025 at 8:32 PM Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:

> The third comment is about test cases. I only see tests for indices
> following dot notation, like data.a[‘b’], but not the reverse. Let’s add a
> few test cases for dot notation following indices, like data[‘a’].b.
>

I believe both cases are already covered. Please take a look at
src/test/regress/jsonb.sql; there's a "-- mixed syntax" section.

Best,
Alex

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2025-09-03 02:22:37 Re: Update outdated references to SLRU ControlLock
Previous Message Alexandra Wang 2025-09-03 02:16:37 Re: SQL:2023 JSON simplified accessor support