Re: Extract numeric filed in JSONB more effectively

From: Haibo Yan <tristan(dot)yim(at)gmail(dot)com>
To: Andy Fan <zhihuifan1213(at)163(dot)com>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Amit Langote <amitlangote09(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, jian he <jian(dot)universality(at)gmail(dot)com>, Chapman Flack <chap(at)anastigmatix(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Extract numeric filed in JSONB more effectively
Date: 2026-04-27 20:35:57
Message-ID: CABXr29Hozrf25ty07Agwgi1N7DNK3NiKgPDOF4AXesW7GJ=DFg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Apr 26, 2026 at 10:01 PM Haibo Yan <tristan(dot)yim(at)gmail(dot)com> wrote:

> On Fri, Apr 10, 2026 at 2:48 PM Haibo Yan <tristan(dot)yim(at)gmail(dot)com> wrote:
>
>> On Wed, Apr 8, 2026 at 12:50 PM Haibo Yan <tristan(dot)yim(at)gmail(dot)com> wrote:
>>
>>> On Tue, Apr 7, 2026 at 6:21 PM Haibo Yan <tristan(dot)yim(at)gmail(dot)com> wrote:
>>>
>>>> On Tue, Apr 7, 2026 at 5:00 PM Andy Fan <zhihuifan1213(at)163(dot)com> wrote:
>>>>
>>>>> Haibo Yan <tristan(dot)yim(at)gmail(dot)com> writes:
>>>>>
>>>>> Hi Haibo,
>>>>>
>>>>> > I agree that if this approach is extended to the full matrix naively,
>>>>> > duplication will become a real issue.
>>>>>
>>>>> Could you summary how it would be? I think it would be helpful for
>>>>> others to review. Otherwise every reviewer needs to count them many
>>>>> times.
>>>>>
>>>>> --
>>>>> Best Regards
>>>>> Andy Fan
>>>>>
>>>> Hi Andy,
>>>> Sure.
>>>>
>>>> My current thought is to extend it in stages, rather than trying to
>>>> solve the full matrix in a single patch.
>>>>
>>>> A rough plan would be:
>>>>
>>>> 1. Keep the current stage-1 patch small and validate the basic approach
>>>> first
>>>>
>>>>
>>>> -
>>>>
>>>> jsonb_object_field / -> / equivalent subscripting form
>>>> -
>>>>
>>>> casts to numeric and bool
>>>> -
>>>>
>>>> support-function rewrite directly to explicit typed extractor
>>>> functions
>>>>
>>>> 2. Extend target types before extending extractor families
>>>>
>>>>
>>>> -
>>>>
>>>> add int4 / int8 / float8 for the same object-field family first
>>>> -
>>>>
>>>> keep the SQL-visible rewrite targets explicit, e.g.
>>>>
>>>> -
>>>>
>>>> jsonb_object_field_int4
>>>> -
>>>>
>>>> jsonb_object_field_int8
>>>> -
>>>>
>>>> jsonb_object_field_float8
>>>>
>>>> -
>>>>
>>>> avoid the previous numeric-intermediate rewrite shape
>>>>
>>>> 3. Then extend to other extractor families with the same overall pattern
>>>>
>>>>
>>>> -
>>>>
>>>> likely starting with jsonb_array_element and jsonb_extract_path
>>>> -
>>>>
>>>> and possibly jsonb_path_query_first later
>>>> -
>>>>
>>>> each family would still rewrite to explicit typed extractor entry
>>>> points, e.g.
>>>>
>>>> -
>>>>
>>>> jsonb_array_element_numeric
>>>> -
>>>>
>>>> jsonb_extract_path_bool
>>>> -
>>>>
>>>> jsonb_path_query_first_int4
>>>>
>>>>
>>>> 4. Keep duplication manageable by sharing the implementation underneath
>>>>
>>>>
>>>> -
>>>>
>>>> keep the SQL/catalog-level rewrite targets explicit for readability
>>>> and reviewability
>>>> -
>>>>
>>>> but factor the C implementation into:
>>>>
>>>> -
>>>>
>>>> extractor-family lookup helpers
>>>> -
>>>>
>>>> target-type conversion helpers
>>>> -
>>>>
>>>> thin wrappers, possibly generated with small macros
>>>>
>>>> So the idea would be: explicit rewrite targets at the SQL/catalog
>>>> level, but shared lookup/conversion code underneath, instead of going back
>>>> to the earlier start/finish/internal pipeline.
>>>>
>>>> I agree that if this is extended naively across the full matrix,
>>>> duplication will become a real issue. My reason for keeping the current
>>>> patch narrow is that I wanted to first validate this simpler rewrite shape
>>>> on a small subset before deciding how best to scale it further.
>>>>
>>>> Regards,
>>>>
>>>> Haibo
>>>>
>>>> Hi all,
>>>
>>> Following up on our previous discussion, I want to clarify the current
>>> patch plan together with the updated first patch.
>>>
>>> Earlier I described this work roughly as a 4-patch line. After iterating
>>> on the implementation and trying to keep each step reviewable, I now think
>>> the cleaner split is a 5-patch series:
>>>
>>>
>>> 1.
>>>
>>> object-field casts to scalar types
>>> 2.
>>>
>>> array-element casts to scalar types
>>> 3.
>>>
>>> extract-path casts to scalar types
>>> 4.
>>>
>>> multi-subscript casts via extract-path lowering
>>> 5.
>>>
>>> jsonpath-first casts to scalar types (jsonb_path_query_first and _tz)
>>>
>>> The overall design is unchanged: use the cast function’s support hook to
>>> recognize cast(extract(...)) over scalar-returning jsonb extraction
>>> functions, and rewrite that directly to explicit typed extractor calls.
>>>
>>> Supported target types remain:
>>>
>>>
>>> - numeric
>>> - bool
>>> - int4
>>> - int8
>>> - float8
>>>
>>> One point I also want to make explicit is that I do not plan to include
>>> jsonb_path_query in this series.
>>>
>>> After looking at it more carefully, I do not think it fits the same
>>> model as the rest of the series. The patches here are all about
>>> scalar-returning extraction functions, where the cast prosupport hook can
>>> see and rewrite a scalar expression pair. jsonb_path_query is
>>> set-returning, so optimizing casts over it would likely need a different
>>> mechanism, probably at planner or executor level, rather than one more
>>> patch in this prosupport-based series.
>>>
>>> Attached here is the updated first patch in the current plan.
>>>
>>> This patch covers object-field extraction only:
>>>
>>>
>>> - jsonb_object_field
>>> - -> with text key
>>> - key subscripting
>>>
>>> and rewrites casts to:
>>>
>>>
>>> - numeric
>>> - bool
>>> - int4
>>> - int8
>>> - float8
>>>
>>> to direct typed extractor calls.
>>>
>>> Thanks again for the earlier comments. I plan to send the remaining
>>> patches in follow-up emails in the order above.
>>>
>>> Regards,
>>> Haibo
>>>
>>
>> Hi all,
>> Following up on the earlier thread, I am continuing with the same overall
>> plan and sending the next two patches in the series together.
>> The design is still the same as before: use the cast function’s support
>> hook to recognize cast(extract(...)) over scalar-returning jsonb extraction
>> functions, and rewrite that directly to explicit typed extractor calls,
>> without changing normal SQL syntax.
>> At this point, I have also folded int2 and float4 into the per-family
>> patches, so each patch now carries the full target-type coverage for the
>> functionality it introduces.
>> The supported target types are now:
>> numeric
>>
>> - bool
>> - int2
>> - int4
>> - int8
>> - float4
>> - float8
>>
>> I am still keeping jsonb_path_query out of scope for this series. The
>> series is focused on scalar-returning extraction functions, where the cast
>> prosupport hook can see and rewrite a scalar expression pair.
>> jsonb_path_query is set-returning, so optimizing casts over it looks like a
>> different planner/executor problem rather than one more patch in this
>> prosupport-based line.
>>
>> With that in mind, the current 5-patch plan is:
>>
>> 1. object-field casts to scalar types
>> 2. array-element casts to scalar types
>> 3. extract-path casts to scalar types
>> 4. multi-subscript casts via extract-path lowering
>> 5. jsonpath-first casts to scalar types (jsonb_path_query_first and
>> _tz)
>>
>> In this email I am sending the first two patches:
>>
>> - patch 1: object-field casts to scalar types
>> - patch 2: array-element casts to scalar types
>>
>> Patch 1 covers:
>>
>> - jsonb_object_field
>> - -> with text key
>> - key subscripting
>>
>> Patch 2 covers:
>>
>> - jsonb_array_element
>> - -> with integer RHS
>> - single-index array subscripting
>>
>> Both patches now support the full target-type set listed above.
>> Thanks again for the earlier comments. I plan to continue with the
>> remaining patches in follow-up emails in the same order.
>>
>> Regards,
>> Haibo
>>
> Hi all,
>
> I spent some time reworking this patch series into a shape that I think is
> easier to review and easier to reason about patch-by-patch.
>
> The goal of the series is still the same: optimize casts over
> scalar-returning jsonb extraction functions by using the cast function’s
> support hook to recognize:
>
> - cast(extract(...))
>
> and rewrite that directly to explicit typed extractor calls.
>
> That keeps ordinary SQL syntax unchanged, but avoids the extra jsonb
> scalar wrapping/unwrapping on the optimized path.
>
> At this point the series is organized as 5 patches:
>
> 1. object-field casts to scalar types
> 2. array-element casts to scalar types
> 3. extract-path casts to scalar types
> 4. multi-subscript casts via extract-path lowering
> 5. jsonpath-first casts to scalar types
>
> The supported target types are:
>
> - numeric
> - bool
> - int2
> - int4
> - int8
> - float4
> - float8
>
> The covered scalar-returning extraction families/forms are:
>
> - jsonb_object_field
> - -> with text key
> - equivalent key subscripting
> - jsonb_array_element
> - -> with integer RHS
> - single-index array subscripting
> - jsonb_extract_path
> - #>
> - direct jsonb_extract_path(...)
> - multi-subscript jsonb subscripting (lowered to extract-path)
> - jsonb_path_query_first
> - jsonb_path_query_first_tz
>
> A few points about the current shape of the series:
>
> - I intentionally stayed with the support-function rewrite model and
> did not introduce new user-visible operators.
> - I also did not go back to the earlier start/finish/internal pipeline
> approach. The current version rewrites directly to explicit typed extractor
> functions.
> - I folded int2 and float4 into the same family-based design, since
> they are the remaining natural numeric-family targets that fit the same
> conversion model.
> - I kept the multi-subscript patch conservative. It lowers through the
> existing extract-path family, but it does not try to turn this into a
> broader executor-side subscripting redesign.
>
> One explicit boundary of the series is that it does *not* try to optimize
> jsonb_path_query.
>
> After looking at that more carefully, I do not think it belongs in the
> same patch line. This series is about scalar-returning extraction
> functions, where the cast prosupport hook can see and rewrite a scalar
> expression pair. jsonb_path_query is set-returning, so optimizing casts
> over it would likely require a different mechanism, probably at planner or
> executor level, rather than one more patch in this prosupport-based series.
>
> Another explicit boundary is that the series stops at the current
> numeric/bool scalar targets. I did not try to extend it to
> text/date/uuid-like conversions, because those would require different
> semantics such as full jsonb serialization or string parsing, rather than
> the direct scalar conversion pattern used here.
>
> I expect one likely question will be the catalog footprint, since this
> approach adds a noticeable number of typed extractor builtins. I think that
> is the cleanest tradeoff in this case: SQL is statically typed, and
> explicit typed extractors keep the rewrite path simple, predictable, and
> reviewable. The implementation-side boilerplate is kept under control with
> shared helpers and thin family-specific macros.
>
> Thanks in advance for any review and feedback.
>
> Regards,
> Haibo
>
Hi all,

I saw that CI failed on the latest version of the series, in the
jsonb_path_query_first / jsonb_path_query_first_tz typed-extractor path.

The failure does not look like an expected-output issue. The more likely
problem is in how the typed jsonpath-first helper is obtaining or returning
the first JsonbValue from the jsonpath execution result.

At the moment my working suspicion is one of these:

1. the code is returning a dangling / no-longer-valid JsonbValue *, for
example a pointer into a temporary JsonValueList or other local
container whose contents are no longer stable by the time the conversion
helper inspects it; or
2. the code is not actually passing the correct result JsonbValue to the
conversion helper, and is instead interpreting some other internal
structure as a JsonbValue, which would explain the bogus unknown jsonb
type failure seen in CI.

Sending the new patches.

Regards,
Haibo

Attachment Content-Type Size
v6-0002-jsonb-optimize-array-element-casts-to-scalar-type.patch application/octet-stream 27.4 KB
v6-0003-jsonb-optimize-extract-path-casts-to-scalar-types.patch application/octet-stream 24.4 KB
v6-0004-jsonb-optimize-multi-subscript-casts-via-extract-.patch application/octet-stream 14.6 KB
v6-0001-jsonb-optimize-object-field-casts-to-scalar-types.patch application/octet-stream 43.6 KB
v6-0005-jsonb-optimize-jsonpath-first-casts-to-scalar-typ.patch application/octet-stream 82.1 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Christophe Pettus 2026-04-27 22:51:12 Re: uuidv7 improperly accepts dates before 1970-01-01
Previous Message Nathan Bossart 2026-04-27 20:03:37 Re: [BUG?] macOS (Intel) build warnings: "ranlib: file … has no symbols" for aarch64 objects