Re: remaining sql/json patches

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: jian he <jian(dot)universality(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Himanshu Upadhyaya <upadhyaya(dot)himanshu(at)gmail(dot)com>, Erik Rijkers <er(at)xs4all(dot)nl>, Andres Freund <andres(at)anarazel(dot)de>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: remaining sql/json patches
Date: 2024-04-02 13:57:21
Message-ID: CA+HiwqGOKvjSD1soND26-eRqJyGsUWPc5mG2L4D7x85aOJsJ0Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Jian,

Thanks for your time on this.

On Mon, Apr 1, 2024 at 9:00 AM jian he <jian(dot)universality(at)gmail(dot)com> wrote:
> SELECT * FROM JSON_TABLE('[]', 'strict $.a' COLUMNS (js2 text PATH
> '$' error on empty error on error) EMPTY ON ERROR);
> should i expect it return one row?
> is there any example to make it return one row from top level "EMPTY ON ERROR"?

I think that's expected. You get 0 rows instead of a single row with
one column containing an empty array, because the NULL returned by the
error-handling part of JSON_TABLE's top-level path is not returned
directly to the user, but instead passed as an input document for the
TableFunc.

I think it suffices to add a note to the documentation of table-level
(that is, not column-level) ON ERROR clause that EMPTY means an empty
"table", not empty array, which is what you get with JSON_QUERY().

> + {
> + JsonTablePlan *scan = (JsonTablePlan *) plan;
> +
> + JsonTableInitPathScan(cxt, planstate, args, mcxt);
> +
> + planstate->nested = scan->child ?
> + JsonTableInitPlan(cxt, scan->child, planstate, args, mcxt) : NULL;
> + }
> first line seems strange, do we just simply change from "plan" to "scan"?

Mostly to complement the "join" variable in the other block.

Anyway, I've reworked this to make JsonTablePlan an abstract struct
and make JsonTablePathScan and JsonTableSiblingJoin "inherit" from it.

> + case JTC_REGULAR:
> + typenameTypeIdAndMod(pstate, rawc->typeName, &typid, &typmod);
> +
> + /*
> + * Use implicit FORMAT JSON for composite types (arrays and
> + * records) or if a non-default WRAPPER / QUOTES behavior is
> + * specified.
> + */
> + if (typeIsComposite(typid) ||
> + rawc->quotes != JS_QUOTES_UNSPEC ||
> + rawc->wrapper != JSW_UNSPEC)
> + rawc->coltype = JTC_FORMATTED;
> per previous discussion, should we refactor the above comment?

Done. Instead of saying "use implicit FORMAT JSON" I've reworked the
comment to mention instead that we do this so that the column uses
JSON_QUERY() as implementation for these cases.

> +/* Recursively set 'reset' flag of planstate and its child nodes */
> +static void
> +JsonTablePlanReset(JsonTablePlanState *planstate)
> +{
> + if (IsA(planstate->plan, JsonTableSiblingJoin))
> + {
> + JsonTablePlanReset(planstate->left);
> + JsonTablePlanReset(planstate->right);
> + planstate->advanceRight = false;
> + }
> + else
> + {
> + planstate->reset = true;
> + planstate->advanceNested = false;
> +
> + if (planstate->nested)
> + JsonTablePlanReset(planstate->nested);
> + }
> per coverage, the first part of the IF branch never executed.
> i also found out that JsonTablePlanReset is quite similar to JsonTableRescan,
> i don't fully understand these two functions though.

Working on improving the documentation of the recursive algorithm,
though I want to focus on finishing 0001 first.

> SELECT * FROM JSON_TABLE(jsonb'{"a": {"z":[1111]}, "b": 1,"c": 2, "d":
> 91}', '$' COLUMNS (
> c int path '$.c',
> d int path '$.d',
> id1 for ordinality,
> NESTED PATH '$.a.z[*]' columns (z int path '$', id for ordinality)
> ));
> doc seems to say that duplicated ordinality columns in different nest
> levels are not allowed?

Both the documentation and the code in JsonTableGetValue() to
calculate a FOR ORDINALITY column were wrong. A nested path's columns
should be able to have its own ordinal counter that runs separately
from the other paths, including the parent path, all the way up to the
root path.

I've fixed both. Added a test case too.

> "currentRow" naming seems misleading, generally, when we think of "row",
> we think of several (not one) datums, or several columns.
> but here, we only have one datum.
> I don't have good optional naming though.

Yeah, I can see the confusion. I've created a new struct called
JsonTablePlanRowSource and different places now use a variable named
just 'current' to refer to the currently active row source. It's
hopefully clear from the context that the datum containing the JSON
object is acting as a source of values for evaluating column paths.

> + case JTC_FORMATTED:
> + case JTC_EXISTS:
> + {
> + Node *je;
> + CaseTestExpr *param = makeNode(CaseTestExpr);
> +
> + param->collation = InvalidOid;
> + param->typeId = contextItemTypid;
> + param->typeMod = -1;
> +
> + je = transformJsonTableColumn(rawc, (Node *) param,
> + NIL, errorOnError);
> +
> + colexpr = transformExpr(pstate, je, EXPR_KIND_FROM_FUNCTION);
> + assign_expr_collations(pstate, colexpr);
> +
> + typid = exprType(colexpr);
> + typmod = exprTypmod(colexpr);
> + break;
> + }
> +
> + default:
> + elog(ERROR, "unknown JSON_TABLE column type: %d", rawc->coltype);
> + break;
> + }
> +
> + tf->coltypes = lappend_oid(tf->coltypes, typid);
> + tf->coltypmods = lappend_int(tf->coltypmods, typmod);
> + tf->colcollations = lappend_oid(tf->colcollations, get_typcollation(typid));
> + tf->colvalexprs = lappend(tf->colvalexprs, colexpr);
>
> why not use exprCollation(colexpr) for tf->colcollations, similar to
> exprType(colexpr)?

Yes, maybe.

On Tue, Apr 2, 2024 at 3:54 PM jian he <jian(dot)universality(at)gmail(dot)com> wrote:
> +/*
> + * Recursively transform child JSON_TABLE plan.
> + *
> + * Default plan is transformed into a cross/union join of its nested columns.
> + * Simple and outer/inner plans are transformed into a JsonTablePlan by
> + * finding and transforming corresponding nested column.
> + * Sibling plans are recursively transformed into a JsonTableSibling.
> + */
> +static Node *
> +transformJsonTableChildPlan(JsonTableParseContext *cxt,
> + List *columns)
> this comment is not the same as the function intention for now.
> maybe we need to refactor it.

Fixed.

> /*
> * Each call to fetch a new set of rows - of which there may be very many
> * if XMLTABLE is being used in a lateral join - will allocate a possibly
> * substantial amount of memory, so we cannot use the per-query context
> * here. perTableCxt now serves the same function as "argcontext" does in
> * FunctionScan - a place to store per-one-call (i.e. one result table)
> * lifetime data (as opposed to per-query or per-result-tuple).
> */
> MemoryContextSwitchTo(tstate->perTableCxt);
>
> maybe we can replace "XMLTABLE" to "XMLTABLE or JSON_TABLE"?

Good catch, done.

>
> /* Transform and coerce the PASSING arguments to to jsonb. */
> there should be only one "to"?

Will need to fix that separately.

> -----------------------------------------------------------------------------------------------------------------------
> json_table_column clause doesn't have a passing clause.
> we can only have one passing clause in json_table.
> but during JsonTableInitPathScan, for each output columns associated
> JsonTablePlanState
> we already initialized the PASSING arguments via `planstate->args = args;`
> also transformJsonTableColumn already has a passingArgs argument.
> technically we can use the jsonpath variable for every output column
> regardless of whether it's nested or not.
>
> JsonTable already has the "passing" clause,
> we just need to pass it to function transformJsonTableColumns and it's callees.
> based on that, I implemented it. seems quite straightforward.
> I also wrote several contrived, slightly complicated tests.
> It seems to work just fine.
>
> simple explanation:
> previously the following sql will fail, error message is that "could
> not find jsonpath variable %s".
> now it will work.
>
> SELECT sub.* FROM
> JSON_TABLE(jsonb '{"a":{"za":[{"z1": [11,2222]},{"z21": [22,
> 234,2345]}]},"c": 3}',
> '$' PASSING 22 AS x, 234 AS y
> COLUMNS(
> xx int path '$.c',
> NESTED PATH '$.a.za[1]' as n1 columns
> (NESTED PATH '$.z21[*]' as n2
> COLUMNS (z21 int path '$?(@ == $"x" || @ == $"y" )' default 0 on empty)),
> NESTED PATH '$.a.za[0]' as n4 columns
> (NESTED PATH '$.z1[*]' as n3
> COLUMNS (z1 int path '$?(@ > $"y" + 1988)' default 0 on empty)))
> )sub;

Thanks for the patch. Yeah, not allowing column paths (including
nested ones) to use top-level PASSING args seems odd, so I wanted to
fix it too.

Please let me know if you have further comments on 0001. I'd like to
get that in before spending more energy on 0002.

--
Thanks, Amit Langote

Attachment Content-Type Size
v48-0001-Add-JSON_TABLE-function.patch application/octet-stream 132.3 KB
v48-0002-JSON_TABLE-Add-support-for-NESTED-columns.patch application/octet-stream 49.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Sabino Mullane 2024-04-02 14:03:57 Re: On disable_cost
Previous Message Daniel Gustafsson 2024-04-02 13:56:13 Re: Confusing #if nesting in hmac_openssl.c