Re: Extract numeric filed in JSONB more effectively

From: Chapman Flack <chap(at)anastigmatix(dot)net>
To: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
Cc: jian he <jian(dot)universality(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Extract numeric filed in JSONB more effectively
Date: 2023-08-30 13:47:53
Message-ID: f6185e3b75c85794e312e46fe6a74837@anastigmatix.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2023-08-30 00:47, Andy Fan wrote:
> see what it is. Suppose the original query is:
>
> numeric(jsonb_object_field(v_jsonb, text)) -> numeric.
> ...
> However the declared type of jsonb_object_field_type is:
>
> jsonb_object_field_type(internal, jsonb, text) -> internal.
>
> So the situation is: b). We return a NUMERIC type which matches
> the original query ...
> case b) shouldn't be treat as an issue.

*We* may know we are returning a NUMERIC type which matches the
original query, but nothing else knows that. Anything that
examined the complete tree after our rewriting would see some
expression that wants a numeric type, but supplied with a
subexpression that returns internal. Without a relabel node
there to promise that we know this internal is really numeric,
any type checker would reject the tree.

The fact that it even works at all without a relabel node there
seems to indicate that all of PostgreSQL's type checking was
done before calling the support function, and that there is not
much sanity checking of what the support function returns,
which I guess is efficient, if a little scary. Seems like
writing a support function is a bit like trapeze performing
without a net.

> So your new method is:
> 1. jsonb_{op}_start() -> internal (internal actually JsonbValue).
> 2. jsonb_finish_{type}(internal, ..) -> type. (internal is JsonbValue
> ).
>
> This avoids the case a) at the very beginning. I'd like to provides
> patches for both solutions for comparison.

I think, unavoidably, there is still a case a) at the very beginning,
just because of the rule that if json_{op}_start is going to have an
internal return type, it needs to have at least one internal parameter
to prevent casual calls from SQL, even if that parameter is not used
for anything.

It would be ok to write in a Const for that parameter, just zero or
42 or anything besides null (in case the function is strict), but
again if the Const has type internal then EXPLAIN will be sad, so
it has to be some type that makes EXPLAIN cheerful, and relabeled
internal.

But with this approach there is no longer a type mismatch of the
end result.

> fcinfo initialization in DirectFunctionCall1 is an interesting point!
> so I am persuaded the extra steps in ExprState may not be
> worse than the current way due to the "every-time-through
> fcinfo initialization" (in which case the memory is allocated
> once in heap rather than every time in stack).

Stack allocation is super cheap, just by emitting the function
entry to reserve n+m bytes instead of just m, so it there's any
measurable cost to the DirectFunctionCall I would think it more
likely to be in the initialization after allocation ... but I
haven't looked at that code closely to see how much there is.
I just wanted to make the point that another step or two in
ExprState might not be a priori worse. We might be talking
about negligible effects in either direction.

> I can do a
> comparison at last to see if we can find some other interesting
> findings.

That would be the way to find out. I think I would still lean
toward the approach with less code duplication, unless there
is a strong timing benefit the other way.

> True, reusing the casting system should be better than hard-code
> the casting function manually. I'd apply this on both methods.

I noticed there is another patch registered in this CF: [1]
It adds new operations within jsonpath like .bigint .time
and so on.

I was wondering whether that work would be conflicting or
complementary with this. It looks to be complementary. The
operations being added there are within jsonpath evaluation.
Here we are working on faster ways to get those results out.

It does not seem that [1] will add any new choices in
JsonbValue. All of its (.bigint .integer .number) seem to
verify the requested form and then put the result as a
numeric in ->val.numeric. So that doesn't add any new
cases for this patch to handle. (Too bad, in a way: if that
other patch added ->val.bigint, this patch could add a case
to retrieve that value without going through the work of
making a numeric. But that would complicate other things
touching JsonbValue, and be a matter for that other patch.)

It may be expanding the choices for what we might one day
find in ->val.datetime though.

Regards,
-Chap

[1] https://commitfest.postgresql.org/44/4526/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-08-30 13:49:45 Re: [17] CREATE SUBSCRIPTION ... SERVER
Previous Message Ashutosh Bapat 2023-08-30 13:41:59 Re: [17] CREATE SUBSCRIPTION ... SERVER