Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Michael Paquier <michael(at)paquier(dot)xyz>, John Naylor <john(dot)naylor(at)enterprisedb(dot)com>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>
Subject: Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size
Date: 2022-08-02 03:05:55
Message-ID: CA+HiwqHU+wWGNjzk=5eV15Nth3jySbQeeUFnVfbZ1F3M6QUjDA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Thanks for looking into this.

On Tue, Aug 2, 2022 at 9:39 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2022-07-27 17:01:13 +0900, Amit Langote wrote:
> > Here's an updated version of the patch, with mostly cosmetic changes.
> > In particular, I added comments describing the new llvm_compile_expr()
> > blobs.
>
> - I've asked a couple times before: Why do we need space for every possible
> datatype at once in JsonItemCoercions? Can there be multiple "concurrent"
> coercions in process?

This topic has been a head-scratcher for me too from the beginning,
but I've since come to understand (convince myself) that we do need
the coercions for all possible types, because we don't know the type
of the JSON item that's going to pop out of the main JSON path
expression until we've run it through the JSON path executor that
ExecEvalJson() invokes. So, it's not possible to statically assign
the coercion. I am not really sure if different coercions may be used
in the same query over multiple evaluations of the same JSON path
expression, but maybe that's also possible.

> The whole coercion stuff just seems incredibly clunky (in a slightly
> different shape before this patch). ExecEvalJsonExprItemCoercion() calls
> ExecPrepareJsonItemCoercion(), which gets a pointer to one of the per-type
> elements in JsonItemCoercionsState, dispatching on the type of the json
> object. Then we later call ExecGetJsonItemCoercion() (via a convoluted
> path), which again will dispatch on the type (extracting the json object
> again afaics!), to then somehow eventually get the coerced value.

I think it might be possible to make this a bit simpler, by not
leaving anything coercion-related in ExecEvalJsonExpr(). I left some
pieces there, because I thought the error of not finding an
appropriate coercion must be thrown right away as the code in
ExecEvalJsonExpr() does after calling ExecGetJsonItemCoercion().

ExecPrepareJsonItemCoercion() is called later when it's time to
actually evaluate the coercion. If we move the error path to
ExecPrepareJsonItemCoercion(), both ExecGetJsonItemCoercion() and the
error path code in ExecEvalJsonExpr() will be unnecessary. I will
give that a try.

> - Looks like there's still some recursive expression states, namely
> JsonExprState->{result_coercion, coercions}?

So, the problem with inlining coercion evaluation into the main parent
JsonExpr's is that it needs to be wrapped in a sub-transaction to
catch any errors and return NULL instead. I don't know a way to wrap
ExprEvalStep evaluation in a sub-transaction to achieve that effect.

> - Looks like the JsonExpr code in ExecInitExprRec() is big enough to
> potentially benefit from splitting out into a separate function?

Thought about it too, so will do.

> - looks like JsonExprPostEvalState could be moved to execExprInterp.c?

OK, will give that a try.

> - I ran the patch against LLVM 14 built with assertions enabled, and it
> triggers an assertion failure:
>
> #3 0x00007f75d165c242 in __GI___assert_fail (
> assertion=0x7f75c278d511 "getOperand(0)->getType() == getOperand(1)->getType() && \"Both operands to ICmp instruction are not of the same type!\"",
> file=0x7f75c2780366 "/home/andres/src/llvm-project-14/llvm/include/llvm/IR/Instructions.h", line=1192,
> function=0x7f75c27d9dcc "void llvm::ICmpInst::AssertOK()") at assert.c:101
> #4 0x00007f75c2b9b25c in llvm::ICmpInst::AssertOK (this=0x55e019290ca0) at /home/andres/src/llvm-project-14/llvm/include/llvm/IR/Instructions.h:1191
> #5 0x00007f75c2b9b0ea in llvm::ICmpInst::ICmpInst (this=0x55e019290ca0, pred=llvm::CmpInst::ICMP_EQ, LHS=0x55e019290c10, RHS=0x55e01928ce80, NameStr="")
> at /home/andres/src/llvm-project-14/llvm/include/llvm/IR/Instructions.h:1246
> #6 0x00007f75c2b93c99 in llvm::IRBuilderBase::CreateICmp (this=0x55e0192894f0, P=llvm::CmpInst::ICMP_EQ, LHS=0x55e019290c10, RHS=0x55e01928ce80, Name="")
> at /home/andres/src/llvm-project-14/llvm/include/llvm/IR/IRBuilder.h:2202
> #7 0x00007f75c2c1bc5d in LLVMBuildICmp (B=0x55e0192894f0, Op=LLVMIntEQ, LHS=0x55e019290c10, RHS=0x55e01928ce80, Name=0x7f75d0d24cbc "")
> at /home/andres/src/llvm-project-14/llvm/lib/IR/Core.cpp:3927
> #8 0x00007f75d0d20b1f in llvm_compile_expr (state=0x55e019201380) at /home/andres/src/postgresql/src/backend/jit/llvm/llvmjit_expr.c:2392
> ...
> #19 0x000055e0184c16d4 in exec_simple_query (query_string=0x55e01912f6e0 "SELECT JSON_EXISTS(NULL::jsonb, '$');") at /home/andres/src/postgresql/src/backend/tcop/postgres.c:1204
>
> this triggers easily interactively - which is nice because that allows to
> dump the types:
>
> p getOperand(0)->getType()->dump() -> prints i64
> p getOperand(1)->getType()->dump() -> prints i32
>
> The immediate issue is that you're setting v_jumpaddrp up as a pointer to a
> pointer to size_t - but then compare it to i32.

Ooh, thanks for letting me know. So maybe I am missing some
llvmjist_emit.h/type.c infrastructure to read an int32 value
(jumpdone) out of an int32 pointer (&jumpdone)?

> I first was confused why the code tries to load the jump target
> dynamically. But then I saw that the interpreted code sets it dynamically -
> why? That's completely unnecessary overhead afaics? There's just two
> possible jump targets, no?

Hmm, I looked at the code for other expressions that jump, especially
CASE WHEN, but they use ready-made EEOP_JUMP_IF_* steps, which can be
added statically. I thought we can't use them in this case, because
the conditions are very ad-hoc, like if the JSON path computation
returned an "empty" item or if the "error" flag was set during that
computation, etc.

> - why is EvalJsonPathVar() in execExprInterp.c, when it's only ever called
> from within jsonpath_exec.c?

Hadn't noticed that because the patch didn't really have to touch it,
but yes, maybe it makes sense to move it there.

> - s/JsobbValue/JsonbValue/

Oops, will fix.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Guo 2022-08-02 03:06:01 Re: Typo in pg_db_role_setting.h
Previous Message Robins Tharakan 2022-08-02 02:41:10 Missing CFI in iterate_word_similarity()