Re: SQL/JSON revisited

From: Andres Freund <andres(at)anarazel(dot)de>
To: Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, e(dot)indrupskaya(at)postgrespro(dot)ru
Subject: Re: SQL/JSON revisited
Date: 2023-02-20 17:24:56
Message-ID: 20230220172456.q3oshnvfk3wyhm5l@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-02-20 16:35:52 +0900, Amit Langote wrote:
> Subject: [PATCH v4 03/10] SQL/JSON query functions
> +/*
> + * Evaluate a JSON error/empty behavior result.
> + */
> +static Datum
> +ExecEvalJsonBehavior(JsonBehavior *behavior, bool *is_null)
> +{
> + *is_null = false;
> +
> + switch (behavior->btype)
> + {
> + case JSON_BEHAVIOR_EMPTY_ARRAY:
> + return JsonbPGetDatum(JsonbMakeEmptyArray());
> +
> + case JSON_BEHAVIOR_EMPTY_OBJECT:
> + return JsonbPGetDatum(JsonbMakeEmptyObject());
> +
> + case JSON_BEHAVIOR_TRUE:
> + return BoolGetDatum(true);
> +
> + case JSON_BEHAVIOR_FALSE:
> + return BoolGetDatum(false);
> +
> + case JSON_BEHAVIOR_NULL:
> + case JSON_BEHAVIOR_UNKNOWN:
> + *is_null = true;
> + return (Datum) 0;
> +
> + case JSON_BEHAVIOR_DEFAULT:
> + /* Always handled in the caller. */
> + Assert(false);
> + return (Datum) 0;
> +
> + default:
> + elog(ERROR, "unrecognized SQL/JSON behavior %d", behavior->btype);
> + return (Datum) 0;
> + }
> +}

Does this actually need to be evaluated at expression eavluation time?
Couldn't we just emit the proper constants in execExpr.c?

> +/* ----------------------------------------------------------------
> + * ExecEvalJson
> + * ----------------------------------------------------------------
> + */
> +void
> +ExecEvalJson(ExprState *state, ExprEvalStep *op, ExprContext *econtext)

Pointless comment.

> +{
> + JsonExprState *jsestate = op->d.jsonexpr.jsestate;
> + JsonExprPreEvalState *pre_eval = &jsestate->pre_eval;
> + JsonExprPostEvalState *post_eval = &jsestate->post_eval;
> + JsonExpr *jexpr = jsestate->jsexpr;
> + Datum item;
> + Datum res = (Datum) 0;
> + JsonPath *path;
> + bool throwErrors = jexpr->on_error->btype == JSON_BEHAVIOR_ERROR;
> +
> + *op->resnull = true; /* until we get a result */
> + *op->resvalue = (Datum) 0;
> +
> + item = pre_eval->formatted_expr.value;
> + path = DatumGetJsonPathP(pre_eval->pathspec.value);
> +
> + /* Reset JsonExprPostEvalState for this evaluation. */
> + memset(post_eval, 0, sizeof(*post_eval));
> +
> + res = ExecEvalJsonExpr(op, econtext, path, item, op->resnull,
> + !throwErrors ? &post_eval->error : NULL);
> +
> + *op->resvalue = res;
> +}

I really don't like having both ExecEvalJson() and ExecEvalJsonExpr(). There's
really no way to know what which version does, just based on the name.

> --- a/src/backend/parser/gram.y
> +++ b/src/backend/parser/gram.y

This stuff adds quite a bit of complexity to the parser. Do we realy need like
a dozen new rules here?

> +json_behavior_empty_array:
> + EMPTY_P ARRAY { $$ = makeJsonBehavior(JSON_BEHAVIOR_EMPTY_ARRAY, NULL); }
> + /* non-standard, for Oracle compatibility only */
> + | EMPTY_P { $$ = makeJsonBehavior(JSON_BEHAVIOR_EMPTY_ARRAY, NULL); }
> + ;

Do we really want to add random oracle compat crud here?

> +/*
> + * Evaluate a JSON path variable caching computed value.
> + */
> +int
> +EvalJsonPathVar(void *cxt, char *varName, int varNameLen,
> + JsonbValue *val, JsonbValue *baseObject)

Missing static?

> +{
> + JsonPathVariableEvalContext *var = NULL;
> + List *vars = cxt;
> + ListCell *lc;
> + int id = 1;
> +
> + if (!varName)
> + return list_length(vars);
> +
> + foreach(lc, vars)
> + {
> + var = lfirst(lc);
> +
> + if (!strncmp(var->name, varName, varNameLen))
> + break;
> +
> + var = NULL;
> + id++;
> + }
> +
> + if (!var)
> + return -1;
> +
> + /*
> + * When belonging to a JsonExpr, path variables are computed with the
> + * JsonExpr's ExprState (var->estate is NULL), so don't need to be computed
> + * here. In some other cases, such as when the path variables belonging
> + * to a JsonTable instead, those variables must be evaluated on their own,
> + * without the enclosing JsonExpr itself needing to be evaluated, so must
> + * be handled here.
> + */
> + if (var->estate && !var->evaluated)
> + {
> + Assert(var->econtext != NULL);
> + var->value = ExecEvalExpr(var->estate, var->econtext, &var->isnull);
> + var->evaluated = true;

Uh, so this continues to do recursive expression evaluation, as
ExecEvalJsonExpr()->JsonPathQuery()->executeJsonPath(EvalJsonPathVar)

I'm getting grumpy here. This is wrong, has been pointed out many times. The
only thing that changes is that the point of recursion is moved around.

> +
> +/*
> + * ExecEvalExprSafe
> + *
> + * Like ExecEvalExpr(), though this allows the caller to pass an
> + * ErrorSaveContext to declare its intenion to catch any errors that occur when
> + * executing the expression, such as when calling type input functions that may
> + * be present in it.
> + */
> +static inline Datum
> +ExecEvalExprSafe(ExprState *state,
> + ExprContext *econtext,
> + bool *isNull,
> + Node *escontext,
> + bool *error)

Afaict there's no caller of this?

>
> +/*
> + * ExecInitExprWithCaseValue
> + *
> + * This is the same as ExecInitExpr, except the caller passes the Datum and
> + * bool pointers that it would like the ExprState.innermost_caseval
> + * and ExprState.innermost_casenull, respectively, to be set to. That way,
> + * it can pass an input value to evaluate the expression via a CaseTestExpr.
> + */
> +ExprState *
> +ExecInitExprWithCaseValue(Expr *node, PlanState *parent,
> + Datum *caseval, bool *casenull)
> +{
> + ExprState *state;
> + ExprEvalStep scratch = {0};
> +
> + /* Special case: NULL expression produces a NULL ExprState pointer */
> + if (node == NULL)
> + return NULL;
> +
> + /* Initialize ExprState with empty step list */
> + state = makeNode(ExprState);
> + state->expr = node;
> + state->parent = parent;
> + state->ext_params = NULL;
> + state->innermost_caseval = caseval;
> + state->innermost_casenull = casenull;
> +
> + /* Insert EEOP_*_FETCHSOME steps as needed */
> + ExecInitExprSlots(state, (Node *) node);
> +
> + /* Compile the expression proper */
> + ExecInitExprRec(node, state, &state->resvalue, &state->resnull);
> +
> + /* Finally, append a DONE step */
> + scratch.opcode = EEOP_DONE;
> + ExprEvalPushStep(state, &scratch);
> +
> + ExecReadyExpr(state);
> +
> + return state;

> +struct JsonTableJoinState
> +{
> + union
> + {
> + struct
> + {
> + JsonTableJoinState *left;
> + JsonTableJoinState *right;
> + bool advanceRight;
> + } join;
> + JsonTableScanState scan;
> + } u;
> + bool is_join;
> +};

A join state that unions the join member with a scan, and has a is_join field?

> +/*
> + * JsonTableInitOpaque
> + * Fill in TableFuncScanState->opaque for JsonTable processor
> + */
> +static void
> +JsonTableInitOpaque(TableFuncScanState *state, int natts)
> +{
> + JsonTableContext *cxt;
> + PlanState *ps = &state->ss.ps;
> + TableFuncScan *tfs = castNode(TableFuncScan, ps->plan);
> + TableFunc *tf = tfs->tablefunc;
> + JsonExpr *ci = castNode(JsonExpr, tf->docexpr);
> + JsonTableParent *root = castNode(JsonTableParent, tf->plan);
> + List *args = NIL;
> + ListCell *lc;
> + int i;
> +
> + cxt = palloc0(sizeof(JsonTableContext));
> + cxt->magic = JSON_TABLE_CONTEXT_MAGIC;
> +
> + if (ci->passing_values)
> + {
> + ListCell *exprlc;
> + ListCell *namelc;
> +
> + forboth(exprlc, ci->passing_values,
> + namelc, ci->passing_names)
> + {
> + Expr *expr = (Expr *) lfirst(exprlc);
> + String *name = lfirst_node(String, namelc);
> + JsonPathVariableEvalContext *var = palloc(sizeof(*var));
> +
> + var->name = pstrdup(name->sval);
> + var->typid = exprType((Node *) expr);
> + var->typmod = exprTypmod((Node *) expr);
> + var->estate = ExecInitExpr(expr, ps);
> + var->econtext = ps->ps_ExprContext;
> + var->mcxt = CurrentMemoryContext;
> + var->evaluated = false;
> + var->value = (Datum) 0;
> + var->isnull = true;
> +
> + args = lappend(args, var);
> + }
> + }
> +
> + cxt->colexprs = palloc(sizeof(*cxt->colexprs) *
> + list_length(tf->colvalexprs));
> +
> + JsonTableInitScanState(cxt, &cxt->root, root, NULL, args,
> + CurrentMemoryContext);
> +
> + i = 0;
> +
> + foreach(lc, tf->colvalexprs)
> + {
> + Expr *expr = lfirst(lc);
> +
> + cxt->colexprs[i].expr =
> + ExecInitExprWithCaseValue(expr, ps,
> + &cxt->colexprs[i].scan->current,
> + &cxt->colexprs[i].scan->currentIsNull);
> +
> + i++;
> + }
> +
> + state->opaque = cxt;
> +}

Evaluating N expressions for a json table isn't a good approach, both memory
and CPU efficiency wise.

Why don't you just emit the proper expression directly, insted of the
CaseTestExpr stuff, that you then separately evaluate?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-02-20 18:11:43 Re: Killing off removed rels properly
Previous Message Nathan Bossart 2023-02-20 17:14:19 Re: add PROCESS_MAIN to VACUUM