Re: SQL:2023 JSON simplified accessor support

From: Alexandra Wang <alexandra(dot)wang(dot)oss(at)gmail(dot)com>
To: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
Cc: jian he <jian(dot)universality(at)gmail(dot)com>, Nikita Malakhov <hukutoc(at)gmail(dot)com>, Vik Fearing <vik(at)postgresfriends(dot)org>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Matheus Alcantara <matheusssilv97(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Nikita Glukhov <glukhov(dot)n(dot)a(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>
Subject: Re: SQL:2023 JSON simplified accessor support
Date: 2025-08-30 00:53:06
Message-ID: CAK98qZ2E-uc10nf4SR9K7Z8VeDPLeTW-+xtdh5Zej54X=mgZcA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Chao,

Thanks for reviewing!

On Thu, Aug 28, 2025 at 8:29 PM Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:

>
> On Aug 26, 2025, at 11:52, Alexandra Wang <alexandra(dot)wang(dot)oss(at)gmail(dot)com>
> wrote:
>
> Best,
> Alex
> <v14-0002-Allow-Generic-Type-Subscripting-to-Accept-Dot-No.patch>
> <v14-0003-Export-jsonPathFromParseResult.patch>
> <v14-0001-Allow-transformation-of-only-a-sublist-of-subscr.patch>
> <v14-0005-Implement-read-only-dot-notation-for-jsonb.patch>
> <v14-0007-Implement-jsonb-wildcard-member-accessor.patch>
> <v14-0006-Implement-Jsonb-subscripting-with-slicing.patch>
> <v14-0004-Extract-coerce_jsonpath_subscript.patch>
>
>
>
> I found a bug.
>
```
> INSERT INTO test_jsonb_types (data) VALUES
> ('[1, 2, "three"]'),
> ('{"con": {"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]}}’);
> ```
>
> If I use a index following a slice, it doesn’t work:
>
> ```
> evantest=# select data[0] from test_jsonb_types;
> data
> ------
>
> 1
>
> (2 rows)
>
> evantest=# select data[0:2][1] from test_jsonb_types; # This should return
> “2"
> data
> ------
>
>
> (2 rows)
>
> evantest=# select (t.data)['con']['a'][0:1] from test_jsonb_types t; #
> returned the slice properly
> data
> -----------------------------------------------------
>
> [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]
> (2 rows)
>
> evantest=# select (t.data)['con']['a'][0:1][0] from test_jsonb_types t; #
> also returned the slice, which is wrong
> data
> -----------------------------------------------------
>
> [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]
> (2 rows)
> ```
>
> We should consider a slice as a container, so the fix is simple. My quick
> unpolished fix is:
>
> ```
> chaol(at)ChaodeMacBook-Air postgresql % git diff
> diff --git a/src/backend/utils/adt/jsonbsubs.c
> b/src/backend/utils/adt/jsonbsubs.c
> index cb72d12ca3f..8845dcf239a 100644
> --- a/src/backend/utils/adt/jsonbsubs.c
> +++ b/src/backend/utils/adt/jsonbsubs.c
> @@ -247,6 +247,7 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, List
> **indirection, Subscripti
> ListCell *lc;
> Datum jsp;
> int pathlen = 0;
> + bool isSlice = false;
>
> sbsref->refupperindexpr = NIL;
> sbsref->reflowerindexpr = NIL;
> @@ -285,6 +286,7 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, List
> **indirection, Subscripti
>
> if (ai->is_slice)
> {
> + isSlice = true;
> while
> (list_length(sbsref->reflowerindexpr) <
> list_length(sbsref->refupperindexpr))
> sbsref->reflowerindexpr =
> lappend(sbsref->reflowerindexpr, NULL);
>
> @@ -369,6 +371,9 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, List
> **indirection, Subscripti
> path->next = jpi;
> path = jpi;
> pathlen++;
> +
> + if (isSlice)
> + break;
> }
>
> if (pathlen == 0)
> ```
>
> After the fix, let’s test again:
>
> ```
> evantest=# select data[0:2][1] from test_jsonb_types; # good result
> data
> ------
> 2
>
> (2 rows)
>
> evantest=# select (t.data)['con']['a'][0:1][0] from test_jsonb_types t; #
> good result
> data
> -------------------------
>
> {"b": {"c": {"d": 99}}}
> (2 rows)
> ```
>

TL;DR: It is a feature, not a bug.

See longer explanation below:

This behavior aligns with the SQL:2023 standard. While the result you
expected is more intuitive in my opinion, it is incorrect according to
the spec.

As I mentioned in the commit message of patch v14-0005:

* The SQL standard states that simplified access is equivalent to:
JSON_QUERY (VEP, 'lax $.JC' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY
NULL ON ERROR) where: VEP = <value expression primary> JC =
<JSON simplified accessor op chain>*

The most relevant detail that I want to highlight is "WITH CONDITIONAL
ARRAY WRAPPER". The documentation[1] says:

*> If the path expression may return multiple values, it might be>
necessary to wrap those values using the WITH WRAPPER clause to make> it a
valid JSON string, because the default behavior is to not wrap> them, as if
WITHOUT WRAPPER were specified. The WITH WRAPPER clause is> by default
taken to mean WITH UNCONDITIONAL WRAPPER, which means that> even a single
result value will be wrapped. To apply the wrapper only> when multiple
values are present, specify WITH CONDITIONAL WRAPPER.> Getting multiple
values in result will be treated as an error if> WITHOUT WRAPPER is
specified.*

So, for your test queries:

select data[0:2] from test_jsonb_types;
select data[0:2][1] from test_jsonb_types;
select (t.data)['con']['a'][0:1] from test_jsonb_types t;
select (t.data)['con']['a'][0:1][0] from test_jsonb_types t;

We have these equivalents using json_query():

select json_query(data, 'lax $[0 to 2]' WITH CONDITIONAL ARRAY WRAPPER NULL
ON EMPTY NULL ON ERROR) from test_jsonb_types;
select json_query(data, 'lax $[0 to 2][1]' WITH CONDITIONAL ARRAY WRAPPER
NULL ON EMPTY NULL ON ERROR) from test_jsonb_types;
select json_query(data, 'lax $.con.a[0 to 1]' WITH CONDITIONAL ARRAY
WRAPPER NULL ON EMPTY NULL ON ERROR) from test_jsonb_types; -- **[NOTE]**
select json_query(data, 'lax $.con.a[0 to 1][0]' WITH CONDITIONAL ARRAY
WRAPPER NULL ON EMPTY NULL ON ERROR) from test_jsonb_types; -- **[NOTE]**

**[NOTE]**: .a and ['a'] (as well as .con and ['con']) are not
syntactically equivalent, as the dot-notation .a is in "lax" mode,
whereas the pre-standard subscript ['a'] is in "strict" mode. I will
discuss this more in a separate reply to your other comment. However,
for the specific data we inserted in your example table, they happen
to return the same results. Since our focus here is not dot-notation,
we won’t go further into it here.

You can verify correctness with:

test=# select data[0:2] = json_query(data, 'lax $[0 to 2]' WITH CONDITIONAL
ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) from test_jsonb_types;
?column?
----------
t
t
(2 rows)

test=# select (data[0:2][1] is NULL) AND (json_query(data, 'lax $[0 to
2][1]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) is NULL)
from test_jsonb_types;
?column?
----------
t
t
(2 rows)

test=# select (((t.data)['con']['a'][0:1] IS NULL) AND (json_query(data,
'lax $.con.a[0 to 1]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON
ERROR) IS NULL)) OR ((t.data)['con']['a'][0:1] = json_query(data, 'lax
$.con.a[0 to 1]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON
ERROR)) from test_jsonb_types t;
?column?
----------
t
t
(2 rows)

test=# select (((t.data)['con']['a'][0:1][0] IS NULL) AND (json_query(data,
'lax $.con.a[0 to 1][0]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL
ON ERROR) IS NULL)) OR ((t.data)['con']['a'][0:1][0] = json_query(data,
'lax $.con.a[0 to 1][0]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL
ON ERROR)) from test_jsonb_types t;
?column?
----------
t
t
(2 rows)

All tests show "t", confirming the current results are all correct.

I think the root of your confusion is the meaning of CONDITIONAL ARRAY
WRAPPER. So let’s try more examples:

-- keep your previous setup
drop table test_jsonb_types;
create table test_jsonb_types (data jsonb);
INSERT INTO test_jsonb_types (data) VALUES ('[1, 2, "three"]'), ('{"con":
{"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]}}');

Let's start with:

select data[0:2] from test_jsonb_types;

It is equivalent to:

select json_query(data, 'lax $[0 to 2]' WITH CONDITIONAL ARRAY WRAPPER NULL
ON EMPTY NULL ON ERROR) from test_jsonb_types;

They all have the following output:

test=# select json_query(data, 'lax $[0 to 2]' WITH CONDITIONAL ARRAY
WRAPPER NULL ON EMPTY NULL ON ERROR) from test_jsonb_types;
json_query
---------------------------------------------------------------------
[1, 2, "three"]
{"con": {"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]}}
(2 rows)

To find out what WITH CONDITIONAL ARRAY WRAPPER does, let's toggle it
to WITHOUT ARRAY WRAPPER:

test=# select json_query(data, 'lax $[0 to 2]' WITHOUT ARRAY WRAPPER NULL
ON EMPTY NULL ON ERROR) from test_jsonb_types;
json_query
---------------------------------------------------------------------

{"con": {"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]}}
(2 rows)

The first row return NULL, because we've specified NULL ON ERROR, so
let's toggle that as well to ERROR ON ERROR:

test=# select json_query(data, 'lax $[0 to 2]' WITHOUT ARRAY WRAPPER NULL
ON EMPTY ERROR ON ERROR) from test_jsonb_types;
ERROR: 22034: JSON path expression in JSON_QUERY must return single item
when no wrapper is requested
HINT: Use the WITH WRAPPER clause to wrap SQL/JSON items into an array.
LOCATION: JsonPathQuery, jsonpath_exec.c:3987

As shown, without ARRAY WRAPPER, the query produces a sequence of JSON
value items, not a JSON array.

Note that array wrapping is only applied to the final result of a
jsonpath, not to each intermediate result in the chain. See the
following example:

-- new setup
truncate test_jsonb_types;
INSERT INTO test_jsonb_types (data) VALUES ('[1, 2, "three"]'), ('[1, [2,
22], "three"]');

test=# select json_query(data, 'lax $[0 to 2][1]' WITHOUT ARRAY WRAPPER
NULL ON EMPTY ERROR ON ERROR) from test_jsonb_types;
json_query
------------

22
(2 rows)

In this case, you can see more clearly that "[0 to 2]" fetches three
individual jsonb array elements, and "[1]" treats each of the three
jsonb values as independent jsonb arrays, reading the first element of
each. This is different from wrapping the intermediate result into an
array and accessing the first element of that wrapped array.

Another thing I want to point out is that there is a trivial case for
"lax" mode when accessing a jsonb object (not a jsonb array) using a
JSON array accessor "[0]". For example:

-- another setup, still use your data
truncate test_jsonb_types;
insert into test_jsonb_types VALUES ('{"con": {"a": [{"b": {"c": {"d":
99}}}, {"b": {"c": {"d": 100}}}]}}');

test=# select (data).con[0] from test_jsonb_types;
con
------------------------------------------------------------
{"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]}
(1 row)

This is equivalent to:

test=# select json_query(data, 'lax $.con[0]' WITH CONDITIONAL ARRAY
WRAPPER NULL ON EMPTY NULL ON ERROR) from test_jsonb_types;
json_query
------------------------------------------------------------
{"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]}
(1 row)

which is different from the result if we use "strict" mode:

test=# select json_query(data, 'strict $.con[0]' WITH CONDITIONAL ARRAY
WRAPPER NULL ON EMPTY NULL ON ERROR) from test_jsonb_types;
json_query
------------

(1 row)

According to SQL:2023:

*In lax mode:— If an operation requires an SQL/JSON array but the operand
is not an SQL/JSON array, then the operand is first “wrapped” in an
SQL/JSON array prior to performing the operation.— If an operation requires
something other than an SQL/JSON array, but the operand is an SQL/JSON
array, then the operand is “unwrapped” by converting its elements into an
SQL/JSON sequence prior to performing the operation.— After applying the
preceding resolutions to structural errors, if there is still a structural
error , the result is an empty SQL/JSON sequence.*

Please refer to the first point to understand the example queries.

Because "lax" mode wraps a jsonb object into an array of a single
element, accessing it with [0] will always return the same jsonb
object. In fact, you can access it with a chain of [0]s and still get
the same jsonb object:

test=# select json_query(data, 'lax $.con[0][0][0][0][0][0]' WITH
CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) from
test_jsonb_types;
json_query
------------------------------------------------------------
{"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]}
(1 row)

test=# select (data).con[0][0][0][0][0][0] from test_jsonb_types;
con
------------------------------------------------------------
{"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]}
(1 row)

I hope this long explanation helps!

[1]
https://www.postgresql.org/docs/current/functions-json.html#SQLJSON-QUERY-FUNCTIONS

Best,
Alex

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2025-08-30 02:51:38 Re: Improve LWLock tranche name visibility across backends
Previous Message Tomas Vondra 2025-08-29 23:07:28 Re: Adding skip scan (including MDAM style range skip scan) to nbtree