From: | Alexandra Wang <alexandra(dot)wang(dot)oss(at)gmail(dot)com> |
---|---|
To: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
Cc: | Nikita Glukhov <glukhov(dot)n(dot)a(at)gmail(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com>, Nikita Malakhov <hukutoc(at)gmail(dot)com>, Vik Fearing <vik(at)postgresfriends(dot)org>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Matheus Alcantara <matheusssilv97(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, "David E(dot) Wheeler" <david(at)justatheory(dot)com> |
Subject: | Re: SQL:2023 JSON simplified accessor support |
Date: | 2025-09-09 23:53:04 |
Message-ID: | CAK98qZ1GwW_dCHSi7wMV_H+eQ9qdWajUNXmntufHF4Md=xRSsw@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Chao,
On Tue, Sep 2, 2025 at 11:57 PM Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
> On Sep 3, 2025, at 10:20, Alexandra Wang <alexandra(dot)wang(dot)oss(at)gmail(dot)com>
> wrote:
>
> This change would give an incorrect result for an accessor like
> "[0].a" when the jsonb column data is a jsonb object instead of a
> jsonb array. I've added two new test cases to cover this scenario:
>
> In jsonb.sql, the newly added tests are:
> select (jb)[0].a from test_jsonb_dot_notation; -- returns same result as
> (jb).a
> select (jb)[1].a from test_jsonb_dot_notation; -- returns NULL
>
>
> I think the latest patch is still wrong. Ultimately, dot notation “.a",
> index “[0] and slice "[1:4]” rely on jsonpath, and subscript [“a”] relies
> on the rest logic in jsonb_subscript_transform() in “foreach”.
>
> Now “jsonb_check_jsonpath_needed()” checks only dot nation and slice, but
> not checking index case. So that reason why your case “select (jb)[0].a”
> works is because the second indirection is a dot nation. However, “select
> (jb)[0][‘a’]” will not work. See my test:
>
> ```
> evantest=# select ('{"name": "Alice", "age": 30}'::jsonb)[0].name;
> name
> ---------
> "Alice"
> (1 row)
>
> evantest=# select ('{"name": "Alice", "age": 30}'::jsonb)[0]['name'];
> jsonb
> -------
>
> (1 row)
> ```
>
> In my test, (jsonb)[0][’name’] should also return “Alice”.
>
> So, end up, “jsonb_check_jsonpath_needed()” only does incomplete and
> inaccurate checks, only jsonb_subscript_make_jsonpath() can make an
> accurate decision and return a null jsonpath upon subscript “[‘a’]”.
>
> As a result, “json_check_jsonpath_needed()” feels not needed at all. In
> jsonb_subscript_transform(), just go ahead to run
> jsonb_subscript_make_jsonpath() first, if returned jsonpath is NULL, then
> run rest of logic.
>
> With my dirty change of removing json_check_jsonpath_needed:
> ```
> chaol(at)ChaodeMacBook-Air postgresql % git diff
> diff --git a/src/backend/utils/adt/jsonbsubs.c
> b/src/backend/utils/adt/jsonbsubs.c
> index 374040b3b4e..d9faab5c799 100644
> --- a/src/backend/utils/adt/jsonbsubs.c
> +++ b/src/backend/utils/adt/jsonbsubs.c
> @@ -416,12 +416,12 @@ jsonb_subscript_transform(SubscriptingRef *sbsref,
> sbsref->refrestype = JSONBOID;
> sbsref->reftypmod = -1;
>
> - if (jsonb_check_jsonpath_needed(*indirection, isSlice))
> - {
> + //if (jsonb_check_jsonpath_needed(*indirection, isSlice))
> + //{
> jsonb_subscript_make_jsonpath(pstate, indirection, sbsref);
> if (sbsref->refjsonbpath)
> return;
> - }
> + //}
>
> /*
> * We reach here only in two cases: (a) the JSON simplified
> accessor is
> ```
>
> You can see:
> ```
> evantest=# select ('{"name": "Alice", "age": 30}'::jsonb)[0]['name'];
> jsonb
> ---------
> "Alice"
> (1 row)
>
> evantest=# select ('{"name": "Alice", "age": 30}'::jsonb)[0].name;
> name
> ---------
> "Alice"
> (1 row)
> ```
>
> Then I found “make check” failed. For example the first broken test case:
>
> ```
> @@ -4998,7 +4998,7 @@
> select ('123'::jsonb)[0];
> jsonb
> -------
> -
> + 123
> (1 row)
> ```
>
> The test expected an empty result, which implies “strict” mode.
>
> But the problem is, which mode should be the default? JSON_QUERY() uses
> “lax” as default mode. And from Peter Eisentraut’s blog:
> https://peter.eisentraut.org/blog/2023/04/04/sql-2023-is-finished-here-is-whats-new
>
> ```
> The semantics of this are defined in terms of JSON_QUERY and JSON_VALUE constructions
> (which have been available since SQL:2016), so this really just syntactic
> sugar.
> ```
>
> Also feels like “lax” should be the default mode. If that is true, then my
> dirty change of removing “json_check_jsonpath_need()” works properly.
>
> The current logic with this patch sounds strange. Because
> “json_check_jsonpath_need()” iterate through unprocessed indirections to
> decide if jsonpath is needed (lax mode). With this logic:
>
> 1) if index [0] directly following dot notation, like (data).a[0], it’s
> lax mode
> 2) if index [0] directly following subscript [‘a’], like (data)[‘a’][0],
> it’s strict mode
> 3) if index [0] directly following the data column, then if there is a dot
> nation in indirection list, use lax mode, otherwise strict mode. For the
> failed test case, as there is no more indirection following [0], so it
> expected strict mode.
>
> I wonder where this behavior is defined?
>
> With my change, 1) and 2) are the same, for 3), if index [0] directly
> following the data column, regardless what indirections are followed, it’s
> by default lax mode.
>
> So, I think this is a design decision. Maybe I missed something from your
> previous design, but I don’t find anything about that from the commit
> comments. I feel this would be better aligned with 1) and 2).
>
Thanks for investigating this, and for the great questions!
Here’s a bit of background. The pre-standard jsonb subscripting
feature [1] was introduced in Postgres 14. Specifically, support for
queries like (jb)[0] and (jb)[0]['a'] was added by this commit:
commit 676887a3b0b8e3c0348ac3f82ab0d16e9a24bd43 (HEAD)
Author: Alexander Korotkov <akorotkov(at)postgresql(dot)org>
Date: Sun Jan 31 23:50:40 2021 +0300
Implementation of subscripting for jsonb
Without my patch, from version 14 through the current master branch,
all supported versions of Postgres return the following results:
test=# select ('123'::jsonb)[0];
jsonb
-------
(1 row)
test=# select ('{"name": "Alice", "age": 30}'::jsonb)[0];
jsonb
-------
(1 row)
test=# select ('{"name": "Alice", "age": 30}'::jsonb)[0]['age'];
jsonb
-------
(1 row)
The simplified accessor syntax defined in the SQL standard includes
dot-notation access as well as integer-based subscripts. It does not
include text-based subscripts such as ['a']. Therefore, the only
overlap between the pre-standard jsonb subscripting and the standard
simplified accessor is non-slicing integer-based subscripts. (Since
the pre-standard jsonb subscripting never supported slicing, we don’t
need to consider that case either when comparing the two syntaxes.)
When we compare the two syntaxes for non-slicing integer-based
subscripts, the only discrepancy is that for a jsonb object jb,
(jb)[0] in the pre-standard syntax returns NULL, while (jb)[0] in the
standard syntax returns (jb) itself. As long as the integer subscript
is non-zero, both syntaxes return the same results.
I think this (jb)[0] case is rather trivial. However, since it's been
behaving the pre-standard way since PG14, I hesitate to change the
existing behavior as part of my patches, and I feel it could be a
bikeshedding kind of discussion in a separate thread.
The main goal of my patch is to add dot-notation. For now, my
intention is for cases like (jb)[0].a to work in the standard way,
because if a user chooses dot-notation instead of text-based
subscripting, they are likely expecting the standard behavior.
Thoughts?
[1]
https://www.postgresql.org/docs/current/datatype-json.html#JSONB-SUBSCRIPTING
From | Date | Subject | |
---|---|---|---|
Next Message | Quan Zongliang | 2025-09-10 00:31:33 | Re: [BUG] PostgreSQL crashes with ThreadSanitizer during early initialization |
Previous Message | Masahiko Sawada | 2025-09-09 23:52:33 | Re: Clear logical slot's 'synced' flag on promotion of standby |