Re: Extract numeric filed in JSONB more effectively

From: Haibo Yan <tristan(dot)yim(at)gmail(dot)com>
To: Andy Fan <zhihuifan1213(at)163(dot)com>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Amit Langote <amitlangote09(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, jian he <jian(dot)universality(at)gmail(dot)com>, Chapman Flack <chap(at)anastigmatix(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Extract numeric filed in JSONB more effectively
Date: 2026-04-27 05:01:58
Message-ID: CABXr29EZNEgjJVNRX3gLOjRZVjXT6tda2_m8DdxQTfr8s8mkEQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 10, 2026 at 2:48 PM Haibo Yan <tristan(dot)yim(at)gmail(dot)com> wrote:

> On Wed, Apr 8, 2026 at 12:50 PM Haibo Yan <tristan(dot)yim(at)gmail(dot)com> wrote:
>
>> On Tue, Apr 7, 2026 at 6:21 PM Haibo Yan <tristan(dot)yim(at)gmail(dot)com> wrote:
>>
>>> On Tue, Apr 7, 2026 at 5:00 PM Andy Fan <zhihuifan1213(at)163(dot)com> wrote:
>>>
>>>> Haibo Yan <tristan(dot)yim(at)gmail(dot)com> writes:
>>>>
>>>> Hi Haibo,
>>>>
>>>> > I agree that if this approach is extended to the full matrix naively,
>>>> > duplication will become a real issue.
>>>>
>>>> Could you summary how it would be? I think it would be helpful for
>>>> others to review. Otherwise every reviewer needs to count them many
>>>> times.
>>>>
>>>> --
>>>> Best Regards
>>>> Andy Fan
>>>>
>>> Hi Andy,
>>> Sure.
>>>
>>> My current thought is to extend it in stages, rather than trying to
>>> solve the full matrix in a single patch.
>>>
>>> A rough plan would be:
>>>
>>> 1. Keep the current stage-1 patch small and validate the basic approach
>>> first
>>>
>>>
>>> -
>>>
>>> jsonb_object_field / -> / equivalent subscripting form
>>> -
>>>
>>> casts to numeric and bool
>>> -
>>>
>>> support-function rewrite directly to explicit typed extractor
>>> functions
>>>
>>> 2. Extend target types before extending extractor families
>>>
>>>
>>> -
>>>
>>> add int4 / int8 / float8 for the same object-field family first
>>> -
>>>
>>> keep the SQL-visible rewrite targets explicit, e.g.
>>>
>>> -
>>>
>>> jsonb_object_field_int4
>>> -
>>>
>>> jsonb_object_field_int8
>>> -
>>>
>>> jsonb_object_field_float8
>>>
>>> -
>>>
>>> avoid the previous numeric-intermediate rewrite shape
>>>
>>> 3. Then extend to other extractor families with the same overall pattern
>>>
>>>
>>> -
>>>
>>> likely starting with jsonb_array_element and jsonb_extract_path
>>> -
>>>
>>> and possibly jsonb_path_query_first later
>>> -
>>>
>>> each family would still rewrite to explicit typed extractor entry
>>> points, e.g.
>>>
>>> -
>>>
>>> jsonb_array_element_numeric
>>> -
>>>
>>> jsonb_extract_path_bool
>>> -
>>>
>>> jsonb_path_query_first_int4
>>>
>>>
>>> 4. Keep duplication manageable by sharing the implementation underneath
>>>
>>>
>>> -
>>>
>>> keep the SQL/catalog-level rewrite targets explicit for readability
>>> and reviewability
>>> -
>>>
>>> but factor the C implementation into:
>>>
>>> -
>>>
>>> extractor-family lookup helpers
>>> -
>>>
>>> target-type conversion helpers
>>> -
>>>
>>> thin wrappers, possibly generated with small macros
>>>
>>> So the idea would be: explicit rewrite targets at the SQL/catalog level,
>>> but shared lookup/conversion code underneath, instead of going back to the
>>> earlier start/finish/internal pipeline.
>>>
>>> I agree that if this is extended naively across the full matrix,
>>> duplication will become a real issue. My reason for keeping the current
>>> patch narrow is that I wanted to first validate this simpler rewrite shape
>>> on a small subset before deciding how best to scale it further.
>>>
>>> Regards,
>>>
>>> Haibo
>>>
>>> Hi all,
>>
>> Following up on our previous discussion, I want to clarify the current
>> patch plan together with the updated first patch.
>>
>> Earlier I described this work roughly as a 4-patch line. After iterating
>> on the implementation and trying to keep each step reviewable, I now think
>> the cleaner split is a 5-patch series:
>>
>>
>> 1.
>>
>> object-field casts to scalar types
>> 2.
>>
>> array-element casts to scalar types
>> 3.
>>
>> extract-path casts to scalar types
>> 4.
>>
>> multi-subscript casts via extract-path lowering
>> 5.
>>
>> jsonpath-first casts to scalar types (jsonb_path_query_first and _tz)
>>
>> The overall design is unchanged: use the cast function’s support hook to
>> recognize cast(extract(...)) over scalar-returning jsonb extraction
>> functions, and rewrite that directly to explicit typed extractor calls.
>>
>> Supported target types remain:
>>
>>
>> - numeric
>> - bool
>> - int4
>> - int8
>> - float8
>>
>> One point I also want to make explicit is that I do not plan to include
>> jsonb_path_query in this series.
>>
>> After looking at it more carefully, I do not think it fits the same model
>> as the rest of the series. The patches here are all about scalar-returning
>> extraction functions, where the cast prosupport hook can see and rewrite a
>> scalar expression pair. jsonb_path_query is set-returning, so optimizing
>> casts over it would likely need a different mechanism, probably at planner
>> or executor level, rather than one more patch in this prosupport-based
>> series.
>>
>> Attached here is the updated first patch in the current plan.
>>
>> This patch covers object-field extraction only:
>>
>>
>> - jsonb_object_field
>> - -> with text key
>> - key subscripting
>>
>> and rewrites casts to:
>>
>>
>> - numeric
>> - bool
>> - int4
>> - int8
>> - float8
>>
>> to direct typed extractor calls.
>>
>> Thanks again for the earlier comments. I plan to send the remaining
>> patches in follow-up emails in the order above.
>>
>> Regards,
>> Haibo
>>
>
> Hi all,
> Following up on the earlier thread, I am continuing with the same overall
> plan and sending the next two patches in the series together.
> The design is still the same as before: use the cast function’s support
> hook to recognize cast(extract(...)) over scalar-returning jsonb extraction
> functions, and rewrite that directly to explicit typed extractor calls,
> without changing normal SQL syntax.
> At this point, I have also folded int2 and float4 into the per-family
> patches, so each patch now carries the full target-type coverage for the
> functionality it introduces.
> The supported target types are now:
> numeric
>
> - bool
> - int2
> - int4
> - int8
> - float4
> - float8
>
> I am still keeping jsonb_path_query out of scope for this series. The
> series is focused on scalar-returning extraction functions, where the cast
> prosupport hook can see and rewrite a scalar expression pair.
> jsonb_path_query is set-returning, so optimizing casts over it looks like a
> different planner/executor problem rather than one more patch in this
> prosupport-based line.
>
> With that in mind, the current 5-patch plan is:
>
> 1. object-field casts to scalar types
> 2. array-element casts to scalar types
> 3. extract-path casts to scalar types
> 4. multi-subscript casts via extract-path lowering
> 5. jsonpath-first casts to scalar types (jsonb_path_query_first and
> _tz)
>
> In this email I am sending the first two patches:
>
> - patch 1: object-field casts to scalar types
> - patch 2: array-element casts to scalar types
>
> Patch 1 covers:
>
> - jsonb_object_field
> - -> with text key
> - key subscripting
>
> Patch 2 covers:
>
> - jsonb_array_element
> - -> with integer RHS
> - single-index array subscripting
>
> Both patches now support the full target-type set listed above.
> Thanks again for the earlier comments. I plan to continue with the
> remaining patches in follow-up emails in the same order.
>
> Regards,
> Haibo
>
Hi all,

I spent some time reworking this patch series into a shape that I think is
easier to review and easier to reason about patch-by-patch.

The goal of the series is still the same: optimize casts over
scalar-returning jsonb extraction functions by using the cast function’s
support hook to recognize:

- cast(extract(...))

and rewrite that directly to explicit typed extractor calls.

That keeps ordinary SQL syntax unchanged, but avoids the extra jsonb scalar
wrapping/unwrapping on the optimized path.

At this point the series is organized as 5 patches:

1. object-field casts to scalar types
2. array-element casts to scalar types
3. extract-path casts to scalar types
4. multi-subscript casts via extract-path lowering
5. jsonpath-first casts to scalar types

The supported target types are:

- numeric
- bool
- int2
- int4
- int8
- float4
- float8

The covered scalar-returning extraction families/forms are:

- jsonb_object_field
- -> with text key
- equivalent key subscripting
- jsonb_array_element
- -> with integer RHS
- single-index array subscripting
- jsonb_extract_path
- #>
- direct jsonb_extract_path(...)
- multi-subscript jsonb subscripting (lowered to extract-path)
- jsonb_path_query_first
- jsonb_path_query_first_tz

A few points about the current shape of the series:

- I intentionally stayed with the support-function rewrite model and did
not introduce new user-visible operators.
- I also did not go back to the earlier start/finish/internal pipeline
approach. The current version rewrites directly to explicit typed extractor
functions.
- I folded int2 and float4 into the same family-based design, since they
are the remaining natural numeric-family targets that fit the same
conversion model.
- I kept the multi-subscript patch conservative. It lowers through the
existing extract-path family, but it does not try to turn this into a
broader executor-side subscripting redesign.

One explicit boundary of the series is that it does *not* try to optimize
jsonb_path_query.

After looking at that more carefully, I do not think it belongs in the same
patch line. This series is about scalar-returning extraction functions,
where the cast prosupport hook can see and rewrite a scalar expression
pair. jsonb_path_query is set-returning, so optimizing casts over it would
likely require a different mechanism, probably at planner or executor
level, rather than one more patch in this prosupport-based series.

Another explicit boundary is that the series stops at the current
numeric/bool scalar targets. I did not try to extend it to
text/date/uuid-like conversions, because those would require different
semantics such as full jsonb serialization or string parsing, rather than
the direct scalar conversion pattern used here.

I expect one likely question will be the catalog footprint, since this
approach adds a noticeable number of typed extractor builtins. I think that
is the cleanest tradeoff in this case: SQL is statically typed, and
explicit typed extractors keep the rewrite path simple, predictable, and
reviewable. The implementation-side boilerplate is kept under control with
shared helpers and thin family-specific macros.

Thanks in advance for any review and feedback.

Regards,
Haibo

Attachment Content-Type Size
v5-0002-jsonb-optimize-array-element-casts-to-scalar-type.patch application/octet-stream 27.4 KB
v5-0003-jsonb-optimize-extract-path-casts-to-scalar-types.patch application/octet-stream 24.4 KB
v5-0001-jsonb-optimize-object-field-casts-to-scalar-types.patch application/octet-stream 43.6 KB
v5-0005-jsonb-optimize-jsonpath-first-casts-to-scalar-typ.patch application/octet-stream 82.6 KB
v5-0004-jsonb-optimize-multi-subscript-casts-via-extract-.patch application/octet-stream 14.6 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Lakshmi N 2026-04-27 05:33:00 [PATCH] Fix Int32GetDatum used for bool column in CREATE SUBSCRIPTION
Previous Message SATYANARAYANA NARLAPURAM 2026-04-27 04:56:42 Re: Fix a server crash problem from pg_get_database_ddl