Re: SQL:2023 JSON simplified accessor support

From: Alexandra Wang <alexandra(dot)wang(dot)oss(at)gmail(dot)com>
To: jian he <jian(dot)universality(at)gmail(dot)com>
Cc: Nikita Malakhov <hukutoc(at)gmail(dot)com>, Vik Fearing <vik(at)postgresfriends(dot)org>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Matheus Alcantara <matheusssilv97(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Nikita Glukhov <glukhov(dot)n(dot)a(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>
Subject: Re: SQL:2023 JSON simplified accessor support
Date: 2025-07-09 08:01:45
Message-ID: CAK98qZ0whQ=c+JGXbGSEBxCtLgy6sf-YGYqsKTAGsS-wt0wj+A@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Jian,

On Tue, Jun 24, 2025 at 3:30 PM jian he <jian(dot)universality(at)gmail(dot)com> wrote:

> hi.
>
> I have applied for 0001 to 0006.
>

Thank you so much for the very detailed and thorough review, and for
the patches!

I've attached v12 that addresses your feedback:

v12-0001 to v12-0004 are refactoring patches that prepare for the
implementation.

v12-0005 - implements jsonb dot-notation (e.g., (jb).a)
Changes from v11-0006 include:
a. subscripting with slicing and wildcard-related code has been
removed and will be added in the following commit
b. introduced a dedicated FieldAccessorExpr node to transform Strings
in dot-notation.
c. mixed syntax (such as (jb).a['b'].c is now allowed, with warnings.
d. miscellaneous bug fixes and code cleanup based on Jian’s feedback.

v12-0006 - implements jsonb subscripting with slicing.
This was split out from v11-0006 into a separate commit.

v12-0007 - implements wildcard member accessor.
I added the Star node that Jelte suggested for transforming the
wildcard member accessor.

Please find more detailed replies below:

On Tue, Jun 24, 2025 at 3:30 PM jian he <jian(dot)universality(at)gmail(dot)com> wrote:

> static void
> jsonb_subscript_transform(SubscriptingRef *sbsref,
> List **indirection,
> ParseState *pstate,
> bool isSlice,
> bool isAssignment)
> {
> List *upperIndexpr = NIL;
> ListCell *idx;
> sbsref->refrestype = JSONBOID;
> sbsref->reftypmod = -1;
> if (jsonb_check_jsonpath_needed(*indirection))
> {
> sbsref->refjsonbpath =
> jsonb_subscript_make_jsonpath(pstate, indirection,
> &sbsref->refupperindexpr,
> &sbsref->reflowerindexpr);
> return;
> }
> foreach(idx, *indirection)
> {
> Node *i = lfirst(idx);
> A_Indices *ai;
> Node *subExpr;
> Assert(IsA(i, A_Indices));
> ai = castNode(A_Indices, i);
> if (isSlice)
> {
> Node *expr = ai->uidx ? ai->uidx : ai->lidx;
> ereport(ERROR,
> (errcode(ERRCODE_DATATYPE_MISMATCH),
> errmsg("jsonb subscript does not support slices"),
> parser_errposition(pstate, exprLocation(expr))));
> }
>
> I am confused by the above error handling:
> errmsg("jsonb subscript does not support slices").

> we can do
> select (jsonb '[1,2,3]')[0:1];
> or
> SELECT (jb).a[2:3].b FROM test_jsonb_dot_notation;
>
> this is by definition, "slices"?
> Anyway, I doubt this error handling will ever be reachable.

> jsonb_check_jsonpath_needed checks whether the indirection contains
> is_slice,
> but jsonb_subscript_transform already takes isSlice as an argument.
> Maybe we can refactor it somehow.
>

Good catch! I believe this error handling was originally copied
because the existing JSONB subscripting in Postgres doesn’t support
slices.

In v12, I’ve separated the slicing implementation into its own commit,
distinct from the one that implements dot-notation. Hope that helps!

On Tue, Jun 24, 2025 at 3:30 PM jian he <jian(dot)universality(at)gmail(dot)com> wrote:

> T_String is a primitive node type with no subnodes.
> typedef struct String
> {
> pg_node_attr(special_read_write)
> NodeTag type;
> char *sval;
> } String;
> then in src/backend/nodes/nodeFuncs.c:
> if (expr && !IsA(expr, String) && WALK(expr))
> return true;
> we can change it to
> if (WALK(expr))
> return true;
> but in function expression_tree_walker_impl
> we have to change it as
>
> case T_MergeSupportFunc:
> case T_String:
> /* primitive node types with no expression subnodes */
> break;
>

You’re right, the changes to the walker-related code weren’t very
clean. In v12, I introduced a new node, FieldAccessorExpr, to serve as
the expression node for dot-notation. This helped minimize changes to
the walker code and, in my opinion, results in a cleaner design than
using a more general Const(text) node.

On Tue, Jun 24, 2025 at 10:57 PM jian he <jian(dot)universality(at)gmail(dot)com>
wrote:

> in src/backend/catalog/sql_features.txt
> should we mark any T860, T861, T862, T863, T864
> items as YES?
>

I’ve updated T860 and T861. I’m not entirely sure about the rest so
left them unchanged.

On Tue, Jun 24, 2025 at 10:57 PM jian he <jian(dot)universality(at)gmail(dot)com>
wrote:

> typedef struct SubscriptingRef
> {
> /* expressions that evaluate to upper container indexes */
> List *refupperindexpr;
> }
> SubscriptingRef.refupperindexpr meaning changed,
> So the above comments also need to be changed?
>

Thanks. Done.

On Tue, Jun 24, 2025 at 10:57 PM jian he <jian(dot)universality(at)gmail(dot)com>
wrote:

> v11-0004-Extract-coerce_jsonpath_subscript.patch
> + /*
> + * We known from can_coerce_type that coercion will succeed, so
> + * coerce_type could be used. Note the implicit coercion context, which is
> + * required to handle subscripts of different types, similar to overloaded
> + * functions.
> + */
> + subExpr = coerce_type(pstate,
> + subExpr, subExprType,
> + targetType, -1,
> + COERCION_IMPLICIT,
> + COERCE_IMPLICIT_CAST,
> + -1);
> + if (subExpr == NULL)
> + ereport(ERROR,
> + (errcode(ERRCODE_DATATYPE_MISMATCH),
> + errmsg("jsonb subscript must have text type"),
>
> the targetType can be "integer", then the error message
> errmsg("jsonb subscript must have text type") would be wrong?
> Also this error handling is not necessary.
> since we can_coerce_type already tell us that coercion will succeed.
> Also, do we really put v11-0004 as a separate patch?
>

Thanks for pointing this out! I’ve removed the unnecessary error
handling as you suggested. I kept v12-0004 as a separate patch to make
the review easier, but I’ll leave it up to the committer to decide
whether to squash it with the other commits.

On Tue, Jun 24, 2025 at 10:57 PM jian he <jian(dot)universality(at)gmail(dot)com>
wrote:

> in gram.y we have:
> if (!IsA($5, A_Const) ||
> castNode(A_Const, $5)->val.node.type != T_String)
> ereport(ERROR,
> errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> errmsg("only string constants are
> supported in JSON_TABLE path specification"),
> so simply, in make_jsonpath_item_expr we can
>
> expr = transformExpr(pstate, expr, pstate->p_expr_kind);
> if (!IsA(expr, Const) || ((Const *) expr)->consttype != INT4OID)
> ereport(ERROR,
> errcode(ERRCODE_DATATYPE_MISMATCH),
> errmsg("only integer constants are supported in jsonb
> simplified accessor subscripting"),
> parser_errposition(pstate, exprLocation(expr)));
>
> because I think the current error message "got type: unknown" is not good.
> select ('123'::jsonb).a['1'];
> ERROR: jsonb simplified accessor supports subscripting in type: INT4,
> got type: unknown
> then we don't need two "ereport(ERROR" within make_jsonpath_item_expr
> we can also Assert (cnst->constisnull) is false.
> see gram.y:16976
>

Thanks for pointing this out! In v12, I added support for mixed
subscripting syntax, and updated the ERROR (or WARNING) messages as
follows:

postgres=# SELECT (jb)['a'].b FROM test_jsonb_dot_notation; -- returns an
array due to lax mode
b
------------
["c", "d"]
(1 row)

postgres=# SELECT (jb).a['b'] FROM test_jsonb_dot_notation; -- returns NULL
due to strict mode with warnings
WARNING: 01000: mixed usage of jsonb simplified accessor syntax and jsonb
subscripting.
LINE 1: SELECT (jb).a['b'] FROM test_jsonb_dot_notation;
^
HINT: use dot-notation for member access, or use non-null integer
constants subscripting for array access.
LOCATION: jsonb_subscript_make_jsonpath, jsonbsubs.c:366
a
---

(1 row)

postgres=# select ('{"a": 1}'::jsonb)['a':'b']; -- fails
ERROR: 42804: only non-null integer constants are supported for jsonb
simplified accessor subscripting
LINE 1: select ('{"a": 1}'::jsonb)['a':'b'];
^
HINT: use int data type for subscripting with slicing.
LOCATION: make_jsonpath_item_expr, jsonbsubs.c:218

So some of the ERRORs now disappear or downgrade to WARNINGs.

On Tue, Jun 24, 2025 at 10:57 PM jian he <jian(dot)universality(at)gmail(dot)com>
wrote:

> I saw you introduce the word "AST", for example
> "Converts jsonpath AST into jsonpath value in binary."
> I am not sure that is fine.
>
Fixed.

On Tue, Jun 24, 2025 at 10:57 PM jian he <jian(dot)universality(at)gmail(dot)com>
wrote:

> in jsonb_subscript_make_jsonpath we have:
> + foreach(lc, *indirection)
> + {
> +
> + if (IsA(accessor, String))
> + ....
> + else if (IsA(accessor, A_Star))
> + ....
> + else if (IsA(accessor, A_Indices))
> + ....
> + else
> + break;
>
> Is the last else branch unreliable? since indirection only support for
> String, A_Star, A_Indices, we already have Assert in
> jsonb_check_jsonpath_needed
> to ensure that.
> + *indirection = list_delete_first_n(*indirection, pathlen);
> also this is not necessary,
> because pathlen will be the same length as list *indirection in current
> design.
>

I kept this logic in v12, because in order to support the mixed usage
of dot-notation and existing jsonb subscripting (e.g., (jb).a['b'].c),
we need to switch between making jsonpath and transforming the upper
indexes for evaluation. So now, *indirection =
list_delete_first_n(*indirection, pathlen); is necessary, and pathlen
can differ.

The else break; can be removed, but I choose to keep it for now and
added comments to clarify the behavior.

On Wed, Jun 25, 2025 at 12:41 AM jian he <jian(dot)universality(at)gmail(dot)com>
wrote:

> On Wed, Jun 25, 2025 at 1:56 PM jian he <jian(dot)universality(at)gmail(dot)com>
> wrote:
> >
> > hi.
>
> CREATE TABLE test_jsonb_dot_notation AS
> SELECT '{"a": [1, 2, {"b": "c"}, {"b": "d", "e": "f", "x": {"y":
> "yyy", "z": "zzz"}}], "b": [3, 4, {"b": "g", "x": {"y": "YYY", "z":
> "ZZZ"}}]}'::jsonb jb;
> CREATE VIEW v1 AS SELECT (jb).a[3].x.y FROM test_jsonb_dot_notation;
> CREATE VIEW v2 AS SELECT (jb).a[3:].x.y[:-1] FROM test_jsonb_dot_notation;
> CREATE VIEW v3 AS SELECT (jb).a[:3].x.y[-1:] FROM test_jsonb_dot_notation;
>
> \sv v2
> \sv v3
> will trigger segment fault.
>

Great catch, thanks! Fixed and added your tests.

On Wed, Jun 25, 2025 at 12:41 AM jian he <jian(dot)universality(at)gmail(dot)com>
wrote:

> also do we need ban subscript slicing, when upper bound is less than
> lower bound,
> for example:
> SELECT (jb).a[3:].x.y[0:'-1'::integer] AS y FROM test_jsonb_dot_notation;
>

I don't think we need to ban this case, since the SQL standard doesn't
ban it, and the (empty) result seems reasonable.

On Thu, Jun 26, 2025 at 7:10 PM jian he <jian(dot)universality(at)gmail(dot)com> wrote:

> in gram.y we have:
> indirection_el:
> '.' attr_name
> {
> $$ = (Node *) makeString($2);
> }
>
> we can be sure that dot notation, following dot is a plain string.
> then in jsonb_subscript_transform, we can transform the String Node to
> a TEXTOID Const.
> with that, most of the
> v11-0005-Enable-String-node-as-field-accessors-in-generic.patch
> would be unnecessary.
> Also in v11-0006-Implement-read-only-dot-notation-for-jsonb.patch
> all these with pattern
> ``if (IsA(expr, String)``
> can be removed.
>

Thanks for giving so much thought into this, I really appreciate it!
As I mentioned earlier, instead of using the Const(text) node, I added
a dedicated FieldAccessorExpr node for dot-notation. I think this
addresses the same concern.

On Thu, Jun 26, 2025 at 7:10 PM jian he <jian(dot)universality(at)gmail(dot)com> wrote:

> in transformContainerSubscripts we have:
> sbsref = makeNode(SubscriptingRef);
> sbsref->refcontainertype = containerType;
> sbsref->refelemtype = elementType;
> sbsref->reftypmod = containerTypMod;
> sbsref->refexpr = (Expr *) containerBase;
> sbsref->refassgnexpr = NULL; /* caller will fill if it's an
> assignment */
> sbsroutines->transform(sbsref, indirection, pstate,
> isSlice, isAssignment);
>
> then jsonb_subscript_transform we have
> sbsref->refjsonbpath =
> jsonb_subscript_make_jsonpath(pstate, indirection,
> &sbsref->refupperindexpr,
> &sbsref->reflowerindexpr);
> of course sbsref->refupperindexpr, sbsref->reflowerindexpr is NIL
> since we first called jsonb_subscript_make_jsonpath.
>
> so we can simplify the function signature as
> static void jsonb_subscript_make_jsonpath(pstate, indirection, sbsref)
>
> Within jsonb_subscript_make_jsonpath we are going to populate
> refupperindexpr, reflowerindexpr, refjsonbpath.
>

You are right. I applied your suggestion.

On Thu, Jun 26, 2025 at 7:10 PM jian he <jian(dot)universality(at)gmail(dot)com> wrote:

> The attached patch addresses both of these issues, along with additional
> related
> refactoring. It consolidates all the changes I think are appropriate,
> based on
> patches v1-0001 to v1-0006. This will include patches previously I sent in
> the earlier thread.
>

Thanks again for the patch! It was really helpful! I didn't directly
apply it as I made a few different choices, but I think I have
addressed all the points you covered in it.

Let me know your thoughts!

Best,
Alex

Attachment Content-Type Size
v12-0006-Implement-Jsonb-subscripting-with-slicing.patch application/octet-stream 17.4 KB
v12-0004-Extract-coerce_jsonpath_subscript.patch application/octet-stream 5.7 KB
v12-0001-Allow-transformation-of-only-a-sublist-of-subscr.patch application/octet-stream 9.0 KB
v12-0007-Implement-jsonb-wildcard-member-accessor.patch application/octet-stream 31.4 KB
v12-0005-Implement-read-only-dot-notation-for-jsonb.patch application/octet-stream 66.3 KB
v12-0003-Export-jsonPathFromParseResult.patch application/octet-stream 2.7 KB
v12-0002-Allow-Generic-Type-Subscripting-to-Accept-Dot-No.patch application/octet-stream 10.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2025-07-09 08:06:39 Re: Problem with transition tables on partitioned tables with foreign-table partitions
Previous Message Yura Sokolov 2025-07-09 07:56:34 Re: Optimize shared LWLock acquisition for high-core-count systems