From: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
---|---|
To: | Alexandra Wang <alexandra(dot)wang(dot)oss(at)gmail(dot)com> |
Cc: | Nikita Glukhov <glukhov(dot)n(dot)a(at)gmail(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com>, Nikita Malakhov <hukutoc(at)gmail(dot)com>, Vik Fearing <vik(at)postgresfriends(dot)org>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Matheus Alcantara <matheusssilv97(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, "David E(dot) Wheeler" <david(at)justatheory(dot)com> |
Subject: | Re: SQL:2023 JSON simplified accessor support |
Date: | 2025-09-03 06:56:44 |
Message-ID: | 052E941B-C905-448F-AE3A-753C12AAC938@gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> On Sep 3, 2025, at 10:20, Alexandra Wang <alexandra(dot)wang(dot)oss(at)gmail(dot)com> wrote:
>
> This change would give an incorrect result for an accessor like
> "[0].a" when the jsonb column data is a jsonb object instead of a
> jsonb array. I've added two new test cases to cover this scenario:
>
> In jsonb.sql, the newly added tests are:
> select (jb)[0].a from test_jsonb_dot_notation; -- returns same result as (jb).a
> select (jb)[1].a from test_jsonb_dot_notation; -- returns NULL
I think the latest patch is still wrong. Ultimately, dot notation “.a", index “[0] and slice "[1:4]” rely on jsonpath, and subscript [“a”] relies on the rest logic in jsonb_subscript_transform() in “foreach”.
Now “jsonb_check_jsonpath_needed()” checks only dot nation and slice, but not checking index case. So that reason why your case “select (jb)[0].a” works is because the second indirection is a dot nation. However, “select (jb)[0][‘a’]” will not work. See my test:
```
evantest=# select ('{"name": "Alice", "age": 30}'::jsonb)[0].name;
name
---------
"Alice"
(1 row)
evantest=# select ('{"name": "Alice", "age": 30}'::jsonb)[0]['name'];
jsonb
-------
(1 row)
```
In my test, (jsonb)[0][’name’] should also return “Alice”.
So, end up, “jsonb_check_jsonpath_needed()” only does incomplete and inaccurate checks, only jsonb_subscript_make_jsonpath() can make an accurate decision and return a null jsonpath upon subscript “[‘a’]”.
As a result, “json_check_jsonpath_needed()” feels not needed at all. In jsonb_subscript_transform(), just go ahead to run jsonb_subscript_make_jsonpath() first, if returned jsonpath is NULL, then run rest of logic.
With my dirty change of removing json_check_jsonpath_needed:
```
chaol(at)ChaodeMacBook-Air postgresql % git diff
diff --git a/src/backend/utils/adt/jsonbsubs.c b/src/backend/utils/adt/jsonbsubs.c
index 374040b3b4e..d9faab5c799 100644
--- a/src/backend/utils/adt/jsonbsubs.c
+++ b/src/backend/utils/adt/jsonbsubs.c
@@ -416,12 +416,12 @@ jsonb_subscript_transform(SubscriptingRef *sbsref,
sbsref->refrestype = JSONBOID;
sbsref->reftypmod = -1;
- if (jsonb_check_jsonpath_needed(*indirection, isSlice))
- {
+ //if (jsonb_check_jsonpath_needed(*indirection, isSlice))
+ //{
jsonb_subscript_make_jsonpath(pstate, indirection, sbsref);
if (sbsref->refjsonbpath)
return;
- }
+ //}
/*
* We reach here only in two cases: (a) the JSON simplified accessor is
```
You can see:
```
evantest=# select ('{"name": "Alice", "age": 30}'::jsonb)[0]['name'];
jsonb
---------
"Alice"
(1 row)
evantest=# select ('{"name": "Alice", "age": 30}'::jsonb)[0].name;
name
---------
"Alice"
(1 row)
```
Then I found “make check” failed. For example the first broken test case:
```
@@ -4998,7 +4998,7 @@
select ('123'::jsonb)[0];
jsonb
-------
-
+ 123
(1 row)
```
The test expected an empty result, which implies “strict” mode.
But the problem is, which mode should be the default? JSON_QUERY() uses “lax” as default mode. And from Peter Eisentraut’s blog: https://peter.eisentraut.org/blog/2023/04/04/sql-2023-is-finished-here-is-whats-new
```
The semantics of this are defined in terms of JSON_QUERY and JSON_VALUE constructions (which have been available since SQL:2016), so this really just syntactic sugar.
```
Also feels like “lax” should be the default mode. If that is true, then my dirty change of removing “json_check_jsonpath_need()” works properly.
The current logic with this patch sounds strange. Because “json_check_jsonpath_need()” iterate through unprocessed indirections to decide if jsonpath is needed (lax mode). With this logic:
1) if index [0] directly following dot notation, like (data).a[0], it’s lax mode
2) if index [0] directly following subscript [‘a’], like (data)[‘a’][0], it’s strict mode
3) if index [0] directly following the data column, then if there is a dot nation in indirection list, use lax mode, otherwise strict mode. For the failed test case, as there is no more indirection following [0], so it expected strict mode.
I wonder where this behavior is defined?
With my change, 1) and 2) are the same, for 3), if index [0] directly following the data column, regardless what indirections are followed, it’s by default lax mode.
So, I think this is a design decision. Maybe I missed something from your previous design, but I don’t find anything about that from the commit comments. I feel this would be better aligned with 1) and 2).
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2025-09-03 07:10:01 | Re: Fix segfault while accessing half-initialized hash table in pgstat_shmem.c |
Previous Message | Jeff Davis | 2025-09-03 06:47:48 | Should io_method=worker remain the default? |