Re: SQL/JSON revisited

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, e(dot)indrupskaya(at)postgrespro(dot)ru
Subject: Re: SQL/JSON revisited
Date: 2023-02-27 07:45:49
Message-ID: CA+HiwqGemKf_CjExy+sfNYv8eu53h_zX=F_ia63DNOvAJrGiGg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks a lot for taking a look at this and sorry about the delay in response.

On Tue, Feb 21, 2023 at 2:25 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2023-02-20 16:35:52 +0900, Amit Langote wrote:
> > Subject: [PATCH v4 03/10] SQL/JSON query functions
> > +/*
> > + * Evaluate a JSON error/empty behavior result.
> > + */
> > +static Datum
> > +ExecEvalJsonBehavior(JsonBehavior *behavior, bool *is_null)
> > +{
> > + *is_null = false;
> > +
> > + switch (behavior->btype)
> > + {
> > + case JSON_BEHAVIOR_EMPTY_ARRAY:
> > + return JsonbPGetDatum(JsonbMakeEmptyArray());
> > +
> > + case JSON_BEHAVIOR_EMPTY_OBJECT:
> > + return JsonbPGetDatum(JsonbMakeEmptyObject());
> > +
> > + case JSON_BEHAVIOR_TRUE:
> > + return BoolGetDatum(true);
> > +
> > + case JSON_BEHAVIOR_FALSE:
> > + return BoolGetDatum(false);
> > +
> > + case JSON_BEHAVIOR_NULL:
> > + case JSON_BEHAVIOR_UNKNOWN:
> > + *is_null = true;
> > + return (Datum) 0;
> > +
> > + case JSON_BEHAVIOR_DEFAULT:
> > + /* Always handled in the caller. */
> > + Assert(false);
> > + return (Datum) 0;
> > +
> > + default:
> > + elog(ERROR, "unrecognized SQL/JSON behavior %d", behavior->btype);
> > + return (Datum) 0;
> > + }
> > +}
>
> Does this actually need to be evaluated at expression eavluation time?
> Couldn't we just emit the proper constants in execExpr.c?

Yes, done that way in the updated patch.

> > +/* ----------------------------------------------------------------
> > + * ExecEvalJson
> > + * ----------------------------------------------------------------
> > + */
> > +void
> > +ExecEvalJson(ExprState *state, ExprEvalStep *op, ExprContext *econtext)
>
> Pointless comment.

Removed this function altogether in favor of merging the body with
ExecEvalJsonExpr(), which does have a more sensible comment

> > +{
> > + JsonExprState *jsestate = op->d.jsonexpr.jsestate;
> > + JsonExprPreEvalState *pre_eval = &jsestate->pre_eval;
> > + JsonExprPostEvalState *post_eval = &jsestate->post_eval;
> > + JsonExpr *jexpr = jsestate->jsexpr;
> > + Datum item;
> > + Datum res = (Datum) 0;
> > + JsonPath *path;
> > + bool throwErrors = jexpr->on_error->btype == JSON_BEHAVIOR_ERROR;
> > +
> > + *op->resnull = true; /* until we get a result */
> > + *op->resvalue = (Datum) 0;
> > +
> > + item = pre_eval->formatted_expr.value;
> > + path = DatumGetJsonPathP(pre_eval->pathspec.value);
> > +
> > + /* Reset JsonExprPostEvalState for this evaluation. */
> > + memset(post_eval, 0, sizeof(*post_eval));
> > +
> > + res = ExecEvalJsonExpr(op, econtext, path, item, op->resnull,
> > + !throwErrors ? &post_eval->error : NULL);
> > +
> > + *op->resvalue = res;
> > +}
>
> I really don't like having both ExecEvalJson() and ExecEvalJsonExpr(). There's
> really no way to know what which version does, just based on the name.

Yes, having two functions is no longer necessary.

> > --- a/src/backend/parser/gram.y
> > +++ b/src/backend/parser/gram.y
>
> This stuff adds quite a bit of complexity to the parser. Do we realy need like
> a dozen new rules here?
>>
> > +json_behavior_empty_array:
> > + EMPTY_P ARRAY { $$ = makeJsonBehavior(JSON_BEHAVIOR_EMPTY_ARRAY, NULL); }
> > + /* non-standard, for Oracle compatibility only */
> > + | EMPTY_P { $$ = makeJsonBehavior(JSON_BEHAVIOR_EMPTY_ARRAY, NULL); }
> > + ;
>
> Do we really want to add random oracle compat crud here?

Hmm, sorry, but I haven't familiarized myself with the grammar side of
things as much as I perhaps should have, so I am not sure whether a
more simplified grammar would suffice for offering a
standard-compliant functionality.

Maybe we could take out the oracle-compatibility bit, but I'd
appreciate it if someone who has been involved with SQL/JSON from the
beginning can comment on the above 2 points.

> > +/*
> > + * Evaluate a JSON path variable caching computed value.
> > + */
> > +int
> > +EvalJsonPathVar(void *cxt, char *varName, int varNameLen,
> > + JsonbValue *val, JsonbValue *baseObject)
>
> Missing static?

Fixed.

> > +{
> > + JsonPathVariableEvalContext *var = NULL;
> > + List *vars = cxt;
> > + ListCell *lc;
> > + int id = 1;
> > +
> > + if (!varName)
> > + return list_length(vars);
> > +
> > + foreach(lc, vars)
> > + {
> > + var = lfirst(lc);
> > +
> > + if (!strncmp(var->name, varName, varNameLen))
> > + break;
> > +
> > + var = NULL;
> > + id++;
> > + }
> > +
> > + if (!var)
> > + return -1;
> > +
> > + /*
> > + * When belonging to a JsonExpr, path variables are computed with the
> > + * JsonExpr's ExprState (var->estate is NULL), so don't need to be computed
> > + * here. In some other cases, such as when the path variables belonging
> > + * to a JsonTable instead, those variables must be evaluated on their own,
> > + * without the enclosing JsonExpr itself needing to be evaluated, so must
> > + * be handled here.
> > + */
> > + if (var->estate && !var->evaluated)
> > + {
> > + Assert(var->econtext != NULL);
> > + var->value = ExecEvalExpr(var->estate, var->econtext, &var->isnull);
> > + var->evaluated = true;
>
> Uh, so this continues to do recursive expression evaluation, as
> ExecEvalJsonExpr()->JsonPathQuery()->executeJsonPath(EvalJsonPathVar)
>
> I'm getting grumpy here. This is wrong, has been pointed out many times. The
> only thing that changes is that the point of recursion is moved around.

Actually, these JSON path vars, along with other sub-expressions of
JsonExpr, *are* computed non-recursively as ExprEvalSteps of the
JsonExprState, at least in the cases where the vars are to be computed
as part of evaluating the JsonExpr itself. So, the code path you've
shown above perhaps as a hypothetical doesn't really exist, though
there *is* an instance where these path vars are computed *outside*
the context of evaluating the parent JsonExpr, such as in
JsonTableResetContextItem(). Maybe there's a cleaner way of doing
that though...

> > +
> > +/*
> > + * ExecEvalExprSafe
> > + *
> > + * Like ExecEvalExpr(), though this allows the caller to pass an
> > + * ErrorSaveContext to declare its intenion to catch any errors that occur when
> > + * executing the expression, such as when calling type input functions that may
> > + * be present in it.
> > + */
> > +static inline Datum
> > +ExecEvalExprSafe(ExprState *state,
> > + ExprContext *econtext,
> > + bool *isNull,
> > + Node *escontext,
> > + bool *error)
>
> Afaict there's no caller of this?

Oops, removed. This was used in a previous version of the patch that
still had nested ExprStates inside JsonExprState.

> > +/*
> > + * ExecInitExprWithCaseValue
> > + *
> > + * This is the same as ExecInitExpr, except the caller passes the Datum and
> > + * bool pointers that it would like the ExprState.innermost_caseval
> > + * and ExprState.innermost_casenull, respectively, to be set to. That way,
> > + * it can pass an input value to evaluate the expression via a CaseTestExpr.
> > + */
> > +ExprState *
> > +ExecInitExprWithCaseValue(Expr *node, PlanState *parent,
> > + Datum *caseval, bool *casenull)
> > +{
> > + ExprState *state;
> > + ExprEvalStep scratch = {0};
> > +
> > + /* Special case: NULL expression produces a NULL ExprState pointer */
> > + if (node == NULL)
> > + return NULL;
> > +
> > + /* Initialize ExprState with empty step list */
> > + state = makeNode(ExprState);
> > + state->expr = node;
> > + state->parent = parent;
> > + state->ext_params = NULL;
> > + state->innermost_caseval = caseval;
> > + state->innermost_casenull = casenull;
> > +
> > + /* Insert EEOP_*_FETCHSOME steps as needed */
> > + ExecInitExprSlots(state, (Node *) node);
> > +
> > + /* Compile the expression proper */
> > + ExecInitExprRec(node, state, &state->resvalue, &state->resnull);
> > +
> > + /* Finally, append a DONE step */
> > + scratch.opcode = EEOP_DONE;
> > + ExprEvalPushStep(state, &scratch);
> > +
> > + ExecReadyExpr(state);
> > +
> > + return state;
>
> > +struct JsonTableJoinState
> > +{
> > + union
> > + {
> > + struct
> > + {
> > + JsonTableJoinState *left;
> > + JsonTableJoinState *right;
> > + bool advanceRight;
> > + } join;
> > + JsonTableScanState scan;
> > + } u;
> > + bool is_join;
> > +};
>
> A join state that unions the join member with a scan, and has a is_join field?

Yeah, I agree that's not the best form for what it is. I've replaced
that with the following:

+/* Structures for JSON_TABLE execution */
+
+typedef enum JsonTablePlanStateType
+{
+ JSON_TABLE_SCAN_STATE = 0,
+ JSON_TABLE_JOIN_STATE
+} JsonTablePlanStateType;
+
+typedef struct JsonTablePlanState
+{
+ JsonTablePlanStateType type;
+
+ struct JsonTablePlanState *parent;
+ struct JsonTablePlanState *nested;
+} JsonTablePlanState;
+
+typedef struct JsonTableScanState
+{
+ JsonTablePlanState plan;
+
+ MemoryContext mcxt;
+ JsonPath *path;
+ List *args;
+ JsonValueList found;
+ JsonValueListIterator iter;
+ Datum current;
+ int ordinal;
+ bool currentIsNull;
+ bool outerJoin;
+ bool errorOnError;
+ bool advanceNested;
+ bool reset;
+} JsonTableScanState;
+
+typedef struct JsonTableJoinState
+{
+ JsonTablePlanState plan;
+
+ JsonTablePlanState *left;
+ JsonTablePlanState *right;
+ bool advanceRight;
+} JsonTableJoinState;

I considered using NodeTag but decided not to, because this stuff is
local to jsonpath_exec.c.

> > + * JsonTableInitOpaque
> > + * Fill in TableFuncScanState->opaque for JsonTable processor
> > + */
> > +static void
> > +JsonTableInitOpaque(TableFuncScanState *state, int natts)
> > +{
> > + JsonTableContext *cxt;
> > + PlanState *ps = &state->ss.ps;
> > + TableFuncScan *tfs = castNode(TableFuncScan, ps->plan);
> > + TableFunc *tf = tfs->tablefunc;
> > + JsonExpr *ci = castNode(JsonExpr, tf->docexpr);
> > + JsonTableParent *root = castNode(JsonTableParent, tf->plan);
> > + List *args = NIL;
> > + ListCell *lc;
> > + int i;
> > +
> > + cxt = palloc0(sizeof(JsonTableContext));
> > + cxt->magic = JSON_TABLE_CONTEXT_MAGIC;
> > +
> > + if (ci->passing_values)
> > + {
> > + ListCell *exprlc;
> > + ListCell *namelc;
> > +
> > + forboth(exprlc, ci->passing_values,
> > + namelc, ci->passing_names)
> > + {
> > + Expr *expr = (Expr *) lfirst(exprlc);
> > + String *name = lfirst_node(String, namelc);
> > + JsonPathVariableEvalContext *var = palloc(sizeof(*var));
> > +
> > + var->name = pstrdup(name->sval);
> > + var->typid = exprType((Node *) expr);
> > + var->typmod = exprTypmod((Node *) expr);
> > + var->estate = ExecInitExpr(expr, ps);
> > + var->econtext = ps->ps_ExprContext;
> > + var->mcxt = CurrentMemoryContext;
> > + var->evaluated = false;
> > + var->value = (Datum) 0;
> > + var->isnull = true;
> > +
> > + args = lappend(args, var);
> > + }
> > + }
> > +
> > + cxt->colexprs = palloc(sizeof(*cxt->colexprs) *
> > + list_length(tf->colvalexprs));
> > +
> > + JsonTableInitScanState(cxt, &cxt->root, root, NULL, args,
> > + CurrentMemoryContext);
> > +
> > + i = 0;
> > +
> > + foreach(lc, tf->colvalexprs)
> > + {
> > + Expr *expr = lfirst(lc);
> > +
> > + cxt->colexprs[i].expr =
> > + ExecInitExprWithCaseValue(expr, ps,
> > + &cxt->colexprs[i].scan->current,
> > + &cxt->colexprs[i].scan->currentIsNull);
> > +
> > + i++;
> > + }
> > +
> > + state->opaque = cxt;
> > +}
>
> Why don't you just emit the proper expression directly, insted of the
> CaseTestExpr stuff, that you then separately evaluate?

I suppose you mean emitting the expression that supplies the value
given by scan->current and scan->currentIsNull into the same ExprState
that holds the steps for a given colvalexpr. If so, I don't really
see a way of doing that given the current model of JSON_TABLE
execution. The former is computed as part of
TableFuncRoutine.FetchRow(scan), which sets scan.current (and
currentIsNull) and the letter is computer as part of
TableFuncRoutine.GetValue(scan, colnum).

> Evaluating N expressions for a json table isn't a good approach, both memory
> and CPU efficiency wise.

Are you referring to JsonTableInitOpaque() initializing all these
sub-expressions of JsonTableParent, especially colvalexprs, using N
*independent* ExprStates? That could perhaps be made to work by
making JsonTableParent be an expression recognized by execExpr.c, so
that a single ExprState can store the steps for all its
sub-expressions, much like JsonExpr is. I'll give that a try, though
I wonder if the semantics of making this work in a single
ExecEvalExpr() call will mismatch that of the current way, because
different sub-expressions are currently evaluated under different APIs
of TableFuncRoutine.

In the meantime, I'm attaching a version of the patchset with a few
things fixed as mentioned above.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
v6-0009-Claim-SQL-standard-compliance-for-SQL-JSON-featur.patch application/octet-stream 2.8 KB
v6-0007-PLAN-clauses-for-JSON_TABLE.patch application/octet-stream 70.6 KB
v6-0005-RETURNING-clause-for-JSON-and-JSON_SCALAR.patch application/octet-stream 10.2 KB
v6-0008-Documentation-for-SQL-JSON-features.patch application/octet-stream 48.2 KB
v6-0006-JSON_TABLE.patch application/octet-stream 99.5 KB
v6-0004-SQL-JSON-functions.patch application/octet-stream 51.9 KB
v6-0003-SQL-JSON-query-functions.patch application/octet-stream 195.5 KB
v6-0001-SQL-JSON-constructors.patch application/octet-stream 157.4 KB
v6-0002-IS-JSON-predicate.patch application/octet-stream 42.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2023-02-27 08:04:21 Re: PGDOCS - sgml linkend using single-quotes
Previous Message gamefunc 2023-02-27 07:33:50 RE: [PATCH] fix msvc build libpq error LNK2019 when link openssl;