Re: More jsonpath methods: translate, split, join

From: Florents Tselai <florents(dot)tselai(at)gmail(dot)com>
To: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: More jsonpath methods: translate, split, join
Date: 2026-06-24 17:17:00
Message-ID: CA+v5N42VZadOoxqaiyrsXNGVTySRxVV=BkFFRJR03+z3a+gsZA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 23, 2026 at 6:30 PM Florents Tselai <florents(dot)tselai(at)gmail(dot)com>
wrote:

>
>
>
> On Tue, Jun 23, 2026 at 6:16 PM Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
> wrote:
>
>> On Fri, Jun 19, 2026 at 3:19 AM Florents Tselai <
>> florents(dot)tselai(at)gmail(dot)com> wrote:
>>
>>>
>>> On Tue, Jun 16, 2026 at 10:53 PM Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
>>> wrote:
>>>
>>>> On Mon, Apr 13, 2026 at 5:57 AM Florents Tselai <
>>>> florents(dot)tselai(at)gmail(dot)com> wrote:
>>>>
>>>>> Hello hackers,
>>>>>
>>>>> This is a follow-up to the work recently merged in bd4f879.
>>>>> In hindsight, I regret not pushing these through for the previous
>>>>> cycle,
>>>>> as they represent the "missing pieces" for users trying to perform
>>>>> data cleaning entirely within the JSONPath engine.
>>>>> With these we can significantly reduce the need for users to "drop
>>>>> out" of JSONPath
>>>>> into standard SQL for basic string-to-string-or-array-and-back
>>>>> workflows.
>>>>>
>>>>>
>>>> Seems this one got overlooked, so I took a look. The first patch
>>>> applies, but the second needs a rebase.
>>>>
>>>
>>> Thanks for having a look Corey,
>>>
>>> here's a consolidated v2 .
>>>
>>
>> My familiarity with the the internal json stuff is limited to my
>> experience in reworking the I/O format of pg_ndistinct and pg_dependencies,
>> plus some other misadventures in statistics export, so I apologize if some
>> of these questions are naive.
>>
>> jsonpath.h:
>>
>> jpiStrJoin is added in the middle of the JsonPathItemType, but
>> jpiStrSplit and jpiStrTranslate were added to the end of the enum. The
>> comment specifically states that "the order must not be changed and new
>> values must be added to the end" - is there some exception to this rule? If
>> there is an exception should we modify the comment and have TAP test to
>> show that it doesn't break pg_upgrade?
>>
>
>
>
>>
>> jsonpath.c:
>>
>> In jspInitBuffer(), you add 3 new cases, but they aren't in a row, aren't
>> organized alphabetically, or any other guiding organization that I can
>> derive.
>>
>> The order of new types in the Assert() in jspGetLeftArg() is different
>> than the order in jspGetRightArg(). Is there a reason they're different?
>>
>> jsonpath_exec.c:
>>
>> Everything here checks out. There were some patterns that initially
>> concerned me, but they all have precedent elsewhere in the file.
>>
>> jsonpath_gram.y and jsonpath_scan.l:
>> LGTM
>>
>> jsonb_jsonpath.out/sql
>>
>> +-- Empty string delimiter (splits into individual characters)
>> +select jsonb_path_query('"abc"', '$.split("")');
>> + jsonb_path_query
>> +------------------
>> + ["abc"]
>>
>> Like Zsolt, I am confused by this test. The description does not match
>> the result. Based on the description, I would expect ["a", "b", "c"], but
>> the output here looks like it was from a '$.split(",")' or any delimiter
>> that isn't one of the letters in the string.
>>
>
> I have this planned, that comment is wrong and should read ;
>
> +-- Empty string delimiter (returns the original string as a single
> element)
>
>
>>
>> the underlying function text_to_array is accessible via SQL
>> string_to_array, and there we get:
>>
>> corey=# select string_to_array('foo', '');
>> string_to_array
>> -----------------
>> {foo}
>> (1 row)
>>
>> corey=# select string_to_array('foo', NULL);
>> string_to_array
>> -----------------
>> {f,o,o}
>> (1 row)
>>
>> Assuming we don't find that behavior to be a bug, we should change the
>> description of the test, or change the second parameter from '' to NULL.
>>
>> Moving along, the first "silent => true" query could use a comment. In
>> general a comment above every test is nice for telling future developers
>> what specifically is being tested.
>>
>> func-json.sgml:
>>
>> The entry for split() makes no mention of the proper behavior when the
>> delimiter is empty
>>
>> Summary:
>>
>> The things I'm noticing seem like very random coding choices, which makes
>> me think that they weren't choices, they're just randomness introduced by
>> squashing and rebasing on top of bd4f879a9cdd. Could you check into that?
>>
>
> Thanks for your thoroughness.
>
> From a quick review I think most of your comments are spot on, and the
> enum ordering especially should be considered a bit critical as it could
> break backwards compatibility and pg_upgrade.
>
> I'll incorporate these in an upcoming v4.
>

Here's that v4 that addresses your points.

Attachment Content-Type Size
v4-0001-Add-more-jsonpath-string-methods-.translate-.spli.patch application/octet-stream 39.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Paul A Jungwirth 2026-06-24 17:21:26 Re: FOR PORTION OF should reject GENERATED columns
Previous Message Masahiko Sawada 2026-06-24 17:09:10 Re: DDL deparse