Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size

From: Andres Freund <andres(at)anarazel(dot)de>
To: Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Michael Paquier <michael(at)paquier(dot)xyz>, John Naylor <john(dot)naylor(at)enterprisedb(dot)com>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>
Subject: Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size
Date: 2022-08-05 20:36:56
Message-ID: 20220805203656.mdgmajvsahi5srcy@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I tried to look into some of the questions from Amit, but I have e.g. no idea
what exactly the use of subtransactions tries to achieve - afaics 1a36bc9dba8
is the first patch to introduce needing to evaluate parts expressions in a
subtransaction - but there's not a single comment explaining why.

It's really hard to understand the new json code. It's a substantial amount of
new infrastructure, without any design documentation that I can find. And it's
not like it's just code that's equivalent to nearby stuff. To me this falls
way below our standards and I think it's *months* of work to fix.

Even just the expresion evaluation code: EvalJsonPathVar(),
ExecEvalJsonConstructor(), ExecEvalJsonExpr(), ExecEvalJson(). There's one
layer of subtransactions in one of the paths in ExecEvalJsonExpr(), another in
ExecEvalJson(). Some paths of ExecEvalJsonExpr() go through subtransactions,
others don't.

It's one thing for a small set of changes to be of this quality. But this is
pretty large - a quick summing of diffstat ends up with about 17k lines added,
of which ~2.5k are docs, ~4.8k are tests.

On 2022-08-04 17:01:48 +0900, Amit Langote wrote:
> On Wed, Aug 3, 2022 at 12:00 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > > The whole coercion stuff just seems incredibly clunky (in a slightly
> > > > different shape before this patch). ExecEvalJsonExprItemCoercion() calls
> > > > ExecPrepareJsonItemCoercion(), which gets a pointer to one of the per-type
> > > > elements in JsonItemCoercionsState, dispatching on the type of the json
> > > > object. Then we later call ExecGetJsonItemCoercion() (via a convoluted
> > > > path), which again will dispatch on the type (extracting the json object
> > > > again afaics!), to then somehow eventually get the coerced value.
> > >
> > > I think it might be possible to make this a bit simpler, by not
> > > leaving anything coercion-related in ExecEvalJsonExpr().
> >
> > Honestly, this code seems like it should just be rewritten from scratch.
>
> Based on what I wrote above, please let me know if I've misunderstood
> your concerns about over-allocation of coercion state. I can try to
> rewrite one more time if I know what this should look like instead.

I don't know. I don't understand the design of what needs to have error
handling, and what not.

> > > > I first was confused why the code tries to load the jump target
> > > > dynamically. But then I saw that the interpreted code sets it dynamically -
> > > > why? That's completely unnecessary overhead afaics? There's just two
> > > > possible jump targets, no?
> > >
> > > Hmm, I looked at the code for other expressions that jump, especially
> > > CASE WHEN, but they use ready-made EEOP_JUMP_IF_* steps, which can be
> > > added statically. I thought we can't use them in this case, because
> > > the conditions are very ad-hoc, like if the JSON path computation
> > > returned an "empty" item or if the "error" flag was set during that
> > > computation, etc.
> >
> > The minimal fix would be to return the jump target from the function, and then
> > jump to that. That at least avoids the roundtrip to memory you have right now.
>
> You mean like this:
>
> LLVMValueRef v_args[2];
> LLVMValueRef v_ret;
>
> /*
> * Call ExecEvalJsonExprSkip() to decide if JSON path
> * evaluation can be skipped. This returns the step
> * address to jump to.
> */
> v_args[0] = v_state;
> v_args[1] = l_ptr_const(op, l_ptr(StructExprEvalStep));
> v_ret = LLVMBuildCall(b,
> llvm_pg_func(mod,
> "ExecEvalJsonExprSkip"),
> params, lengthof(params), "");
>
> Actually, this is how I had started, but never figured out how to jump
> to the address in v_ret. As in, how to extract the plain C int32
> value that is the jump address from v_ret, an LLVMValueRef, to use in
> the following statement:
>
> LLVMBuildBr(b, opblocks[<int32-in-v_ret>]);

We could make that work, but even keeping it more similar to your current
code, you're already dealing with a variable jump target. Only that you load
it from a variable, instead of the function return type. So you could just
v_ret instead of v_jumpaddr, and your code would be simpler and faster.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-08-05 20:41:47 Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Previous Message Alvaro Herrera 2022-08-05 20:29:40 Re: standby recovery fails (tablespace related) (tentative patch and discussion)