| From: | Haibo Yan <tristan(dot)yim(at)gmail(dot)com> |
|---|---|
| To: | Andy Fan <zhihuifan1213(at)163(dot)com> |
| Cc: | Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Amit Langote <amitlangote09(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, jian he <jian(dot)universality(at)gmail(dot)com>, Chapman Flack <chap(at)anastigmatix(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Re: Extract numeric filed in JSONB more effectively |
| Date: | 2026-04-10 21:48:32 |
| Message-ID: | CABXr29G35c6Qo-Ab4EpH24gHS+ejjJGfhmW3nQ1eEK+Rgt_9Wg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Wed, Apr 8, 2026 at 12:50 PM Haibo Yan <tristan(dot)yim(at)gmail(dot)com> wrote:
> On Tue, Apr 7, 2026 at 6:21 PM Haibo Yan <tristan(dot)yim(at)gmail(dot)com> wrote:
>
>> On Tue, Apr 7, 2026 at 5:00 PM Andy Fan <zhihuifan1213(at)163(dot)com> wrote:
>>
>>> Haibo Yan <tristan(dot)yim(at)gmail(dot)com> writes:
>>>
>>> Hi Haibo,
>>>
>>> > I agree that if this approach is extended to the full matrix naively,
>>> > duplication will become a real issue.
>>>
>>> Could you summary how it would be? I think it would be helpful for
>>> others to review. Otherwise every reviewer needs to count them many
>>> times.
>>>
>>> --
>>> Best Regards
>>> Andy Fan
>>>
>> Hi Andy,
>> Sure.
>>
>> My current thought is to extend it in stages, rather than trying to solve
>> the full matrix in a single patch.
>>
>> A rough plan would be:
>>
>> 1. Keep the current stage-1 patch small and validate the basic approach
>> first
>>
>>
>> -
>>
>> jsonb_object_field / -> / equivalent subscripting form
>> -
>>
>> casts to numeric and bool
>> -
>>
>> support-function rewrite directly to explicit typed extractor
>> functions
>>
>> 2. Extend target types before extending extractor families
>>
>>
>> -
>>
>> add int4 / int8 / float8 for the same object-field family first
>> -
>>
>> keep the SQL-visible rewrite targets explicit, e.g.
>>
>> -
>>
>> jsonb_object_field_int4
>> -
>>
>> jsonb_object_field_int8
>> -
>>
>> jsonb_object_field_float8
>>
>> -
>>
>> avoid the previous numeric-intermediate rewrite shape
>>
>> 3. Then extend to other extractor families with the same overall pattern
>>
>>
>> -
>>
>> likely starting with jsonb_array_element and jsonb_extract_path
>> -
>>
>> and possibly jsonb_path_query_first later
>> -
>>
>> each family would still rewrite to explicit typed extractor entry
>> points, e.g.
>>
>> -
>>
>> jsonb_array_element_numeric
>> -
>>
>> jsonb_extract_path_bool
>> -
>>
>> jsonb_path_query_first_int4
>>
>>
>> 4. Keep duplication manageable by sharing the implementation underneath
>>
>>
>> -
>>
>> keep the SQL/catalog-level rewrite targets explicit for readability
>> and reviewability
>> -
>>
>> but factor the C implementation into:
>>
>> -
>>
>> extractor-family lookup helpers
>> -
>>
>> target-type conversion helpers
>> -
>>
>> thin wrappers, possibly generated with small macros
>>
>> So the idea would be: explicit rewrite targets at the SQL/catalog level,
>> but shared lookup/conversion code underneath, instead of going back to the
>> earlier start/finish/internal pipeline.
>>
>> I agree that if this is extended naively across the full matrix,
>> duplication will become a real issue. My reason for keeping the current
>> patch narrow is that I wanted to first validate this simpler rewrite shape
>> on a small subset before deciding how best to scale it further.
>>
>> Regards,
>>
>> Haibo
>>
>> Hi all,
>
> Following up on our previous discussion, I want to clarify the current
> patch plan together with the updated first patch.
>
> Earlier I described this work roughly as a 4-patch line. After iterating
> on the implementation and trying to keep each step reviewable, I now think
> the cleaner split is a 5-patch series:
>
>
> 1.
>
> object-field casts to scalar types
> 2.
>
> array-element casts to scalar types
> 3.
>
> extract-path casts to scalar types
> 4.
>
> multi-subscript casts via extract-path lowering
> 5.
>
> jsonpath-first casts to scalar types (jsonb_path_query_first and _tz)
>
> The overall design is unchanged: use the cast function’s support hook to
> recognize cast(extract(...)) over scalar-returning jsonb extraction
> functions, and rewrite that directly to explicit typed extractor calls.
>
> Supported target types remain:
>
>
> - numeric
> - bool
> - int4
> - int8
> - float8
>
> One point I also want to make explicit is that I do not plan to include
> jsonb_path_query in this series.
>
> After looking at it more carefully, I do not think it fits the same model
> as the rest of the series. The patches here are all about scalar-returning
> extraction functions, where the cast prosupport hook can see and rewrite a
> scalar expression pair. jsonb_path_query is set-returning, so optimizing
> casts over it would likely need a different mechanism, probably at planner
> or executor level, rather than one more patch in this prosupport-based
> series.
>
> Attached here is the updated first patch in the current plan.
>
> This patch covers object-field extraction only:
>
>
> - jsonb_object_field
> - -> with text key
> - key subscripting
>
> and rewrites casts to:
>
>
> - numeric
> - bool
> - int4
> - int8
> - float8
>
> to direct typed extractor calls.
>
> Thanks again for the earlier comments. I plan to send the remaining
> patches in follow-up emails in the order above.
>
> Regards,
> Haibo
>
Hi all,
Following up on the earlier thread, I am continuing with the same overall
plan and sending the next two patches in the series together.
The design is still the same as before: use the cast function’s support
hook to recognize cast(extract(...)) over scalar-returning jsonb extraction
functions, and rewrite that directly to explicit typed extractor calls,
without changing normal SQL syntax.
At this point, I have also folded int2 and float4 into the per-family
patches, so each patch now carries the full target-type coverage for the
functionality it introduces.
The supported target types are now:
numeric
- bool
- int2
- int4
- int8
- float4
- float8
I am still keeping jsonb_path_query out of scope for this series. The
series is focused on scalar-returning extraction functions, where the cast
prosupport hook can see and rewrite a scalar expression pair.
jsonb_path_query is set-returning, so optimizing casts over it looks like a
different planner/executor problem rather than one more patch in this
prosupport-based line.
With that in mind, the current 5-patch plan is:
1. object-field casts to scalar types
2. array-element casts to scalar types
3. extract-path casts to scalar types
4. multi-subscript casts via extract-path lowering
5. jsonpath-first casts to scalar types (jsonb_path_query_first and _tz)
In this email I am sending the first two patches:
- patch 1: object-field casts to scalar types
- patch 2: array-element casts to scalar types
Patch 1 covers:
- jsonb_object_field
- -> with text key
- key subscripting
Patch 2 covers:
- jsonb_array_element
- -> with integer RHS
- single-index array subscripting
Both patches now support the full target-type set listed above.
Thanks again for the earlier comments. I plan to continue with the
remaining patches in follow-up emails in the same order.
Regards,
Haibo
| Attachment | Content-Type | Size |
|---|---|---|
| v5-0001-jsonb-optimize-object-field-casts-to-scalar-types.patch | application/octet-stream | 43.6 KB |
| v5-0002-jsonb-optimize-array-element-casts-to-scalar-type.patch | application/octet-stream | 27.4 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | SATYANARAYANA NARLAPURAM | 2026-04-10 22:00:57 | Re: FOR PORTION OF does not recompute GENERATED STORED columns that depend on the range column |
| Previous Message | Michael Paquier | 2026-04-10 20:34:34 | Re: pg17: XX000: no relation entry for relid 0 |