Re: Extract numeric filed in JSONB more effectively

From: Haibo Yan <tristan(dot)yim(at)gmail(dot)com>
To: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Cc: Andy Fan <zhihuifan1213(at)163(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Amit Langote <amitlangote09(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, jian he <jian(dot)universality(at)gmail(dot)com>, Chapman Flack <chap(at)anastigmatix(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Extract numeric filed in JSONB more effectively
Date: 2026-04-03 04:46:22
Message-ID: 7487A577-7BDC-4B94-9FFA-741E95439BFC@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Nov 22, 2024, at 10:14 AM, Dmitry Dolgov <9erthalion6(at)gmail(dot)com> wrote:
>
>> On Mon, Nov 18, 2024 at 08:23:52AM GMT, Andy Fan wrote:
>>
>>>>> I imagined you'd the patch should create a SupportRequestSimplify
>>>>> support function for jsonb_numeric() that checks if the input
>>>>> expression is an OpExpr with funcid of jsonb_object_field(). All you
>>>>> do then is ditch the cast and change the OpExpr to call a new function
>>>>> named jsonb_object_field_numeric() which returns the val.numeric
>>>>> directly. Likely the same support function could handle jsonb casts
>>>>> to other types too, in which case you'd just call some other function,
>>>>> e.g jsonb_object_field_timestamp() or jsonb_object_field_boolean().
>>>>
>>>> Basically yes. The reason complexity comes when we many operators we
>>>> want to optimize AND my patch I want to reduce the number of function
>>>> created.
>>>>
>>>> [...]
>>>>
>>>> Within the start / finish function, we need to create *7* functions.
>>>
>>> Any particular reason you want to keep number of functions minimal? Is
>>> it just to make the patch smaller? I might be missing something without
>>> looking at the implementation in details, but the difference between 10
>>> and 7 functions doesn't seem to be significant.
>>
>> Another reason is for reducing code duplication, writting too many
>> similar function looks not good to me. Chapman expressed this idea
>> first at [1]. Search "it would make me happy to further reduce some
>> of the code" in the message.
>>
>> Acutally this doesn't make the patch complexer too much.
>>
>> [1]
>> https://www.postgresql.org/message-id/5138c6b5fd239e7ce4e1a4e63826ac27%40anastigmatix.net
>
> It might not make everything too much complex, but e.g. relabeling of
> the first argument for a "finish" function into an internal one sounds
> strange to me. Maybe there is a way to avoid duplication of the code,
> but keep all needed functions in pg_proc?
>
> Btw, sorry to complain about small details, but I find start / finish
> naming pattern not quite fitting here. Their main purpose is to extract
> / convert a value, the order in which they are happening is less
> relevant.
>
>
>
>

Hi all,
I’d like to continue pushing this patch forward.
Based on the earlier discussion, I reworked the patch into a smaller stage-1 version with a narrower scope and a simpler rewrite strategy. The current patch keeps the normal SQL syntax unchanged and uses support-function simplification to rewrite only the following patterns:
(jsonb_object_field(...))::numeric
jsonb_object_field(...))::bool
into explicit typed extractor calls.
So at this stage it intentionally covers only:
jsonb_object_field(...) / ->
casts to numeric
casts to bool
and does not yet try to cover array/path extraction or integer/float typed extractors.
I also ran a small microbenchmark to isolate the cast-over-object-field path. On my setup, the current patch shows the following gains:
Query Before After Speedup
--------------------------------------------------------------------------------
SELECT sum((j->'n')::numeric) FROM t 118.028 ms 56.082 ms 2.10x
SELECT count(*) FROM t WHERE (j->'b')::bool 115.665 ms 51.945 ms 2.23x
--------------------------------------------------------------------------------
These are microbenchmark numbers rather than end-to-end workload results, but they suggest that the simplified rewrite path is worth pursuing.
My goal with this version is not to solve the full matrix at once, but to first land a reviewer-friendly subset that:
1. does not introduce new user-visible operators,
2. keeps ordinary cast syntax unchanged,
3. avoids the more abstract internal/start-finish style machinery,
4. and uses explicit rewrite targets that are easier to review.
If this direction looks reasonable, I’d appreciate another round of review on the updated patch. If people think the stage-1 scope is acceptable, I can continue with follow-up patches for additional extractor families and target types.
Thanks,
Haibo

Attachment Content-Type Size
0001-jsonb-optimize-object-field-casts-to-numeric-and-boo.patch application/octet-stream 19.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2026-04-03 04:50:27 Re: Extract numeric filed in JSONB more effectively
Previous Message shveta malik 2026-04-03 04:16:49 Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication