Re: Extract numeric filed in JSONB more effectively

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Haibo Yan <tristan(dot)yim(at)gmail(dot)com>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Andy Fan <zhihuifan1213(at)163(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Amit Langote <amitlangote09(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, jian he <jian(dot)universality(at)gmail(dot)com>, Chapman Flack <chap(at)anastigmatix(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Extract numeric filed in JSONB more effectively
Date: 2026-04-03 04:50:27
Message-ID: CAFj8pRBpV9e+709D-qf=g5hgtXVN_wzKipRC0c_ZmdHws8rWtw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

pá 3. 4. 2026 v 6:46 odesílatel Haibo Yan <tristan(dot)yim(at)gmail(dot)com> napsal:

> On Nov 22, 2024, at 10:14 AM, Dmitry Dolgov <9erthalion6(at)gmail(dot)com> wrote:
> >
> >> On Mon, Nov 18, 2024 at 08:23:52AM GMT, Andy Fan wrote:
> >>
> >>>>> I imagined you'd the patch should create a SupportRequestSimplify
> >>>>> support function for jsonb_numeric() that checks if the input
> >>>>> expression is an OpExpr with funcid of jsonb_object_field(). All you
> >>>>> do then is ditch the cast and change the OpExpr to call a new
> function
> >>>>> named jsonb_object_field_numeric() which returns the val.numeric
> >>>>> directly. Likely the same support function could handle jsonb casts
> >>>>> to other types too, in which case you'd just call some other
> function,
> >>>>> e.g jsonb_object_field_timestamp() or jsonb_object_field_boolean().
> >>>>
> >>>> Basically yes. The reason complexity comes when we many operators we
> >>>> want to optimize AND my patch I want to reduce the number of function
> >>>> created.
> >>>>
> >>>> [...]
> >>>>
> >>>> Within the start / finish function, we need to create *7* functions.
> >>>
> >>> Any particular reason you want to keep number of functions minimal? Is
> >>> it just to make the patch smaller? I might be missing something without
> >>> looking at the implementation in details, but the difference between 10
> >>> and 7 functions doesn't seem to be significant.
> >>
> >> Another reason is for reducing code duplication, writting too many
> >> similar function looks not good to me. Chapman expressed this idea
> >> first at [1]. Search "it would make me happy to further reduce some
> >> of the code" in the message.
> >>
> >> Acutally this doesn't make the patch complexer too much.
> >>
> >> [1]
> >>
> https://www.postgresql.org/message-id/5138c6b5fd239e7ce4e1a4e63826ac27%40anastigmatix.net
> >
> > It might not make everything too much complex, but e.g. relabeling of
> > the first argument for a "finish" function into an internal one sounds
> > strange to me. Maybe there is a way to avoid duplication of the code,
> > but keep all needed functions in pg_proc?
> >
> > Btw, sorry to complain about small details, but I find start / finish
> > naming pattern not quite fitting here. Their main purpose is to extract
> > / convert a value, the order in which they are happening is less
> > relevant.
> >
> >
> >
> >
>
> Hi all,
> I’d like to continue pushing this patch forward.
> Based on the earlier discussion, I reworked the patch into a smaller
> stage-1 version with a narrower scope and a simpler rewrite strategy. The
> current patch keeps the normal SQL syntax unchanged and uses
> support-function simplification to rewrite only the following patterns:
> (jsonb_object_field(...))::numeric
> jsonb_object_field(...))::bool
> into explicit typed extractor calls.
> So at this stage it intentionally covers only:
> jsonb_object_field(...) / ->
> casts to numeric
> casts to bool
> and does not yet try to cover array/path extraction or integer/float typed
> extractors.
> I also ran a small microbenchmark to isolate the cast-over-object-field
> path. On my setup, the current patch shows the following gains:
> Query Before After
> Speedup
>
> --------------------------------------------------------------------------------
> SELECT sum((j->'n')::numeric) FROM t 118.028 ms 56.082 ms 2.10x
> SELECT count(*) FROM t WHERE (j->'b')::bool 115.665 ms 51.945 ms 2.23x
>
> --------------------------------------------------------------------------------
> These are microbenchmark numbers rather than end-to-end workload results,
> but they suggest that the simplified rewrite path is worth pursuing.
> My goal with this version is not to solve the full matrix at once, but to
> first land a reviewer-friendly subset that:
> 1. does not introduce new user-visible operators,
> 2. keeps ordinary cast syntax unchanged,
> 3. avoids the more abstract internal/start-finish style machinery,
> 4. and uses explicit rewrite targets that are easier to review.
> If this direction looks reasonable, I’d appreciate another round of review
> on the updated patch. If people think the stage-1 scope is acceptable, I
> can continue with follow-up patches for additional extractor families and
> target types.
> Thanks,
> Haibo
>

+1

Pavel

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Lakhin 2026-04-03 05:00:00 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
Previous Message Haibo Yan 2026-04-03 04:46:22 Re: Extract numeric filed in JSONB more effectively