| From: | Corey Huinker <corey(dot)huinker(at)gmail(dot)com> |
|---|---|
| To: | Florents Tselai <florents(dot)tselai(at)gmail(dot)com> |
| Cc: | pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: More jsonpath methods: translate, split, join |
| Date: | 2026-06-25 18:00:02 |
| Message-ID: | CADkLM=c9o0HbmGQ9ibd+pRMhFkPDfN8q-QWPYgy_y5TNVz+PFA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Wed, Jun 24, 2026 at 1:17 PM Florents Tselai <florents(dot)tselai(at)gmail(dot)com>
wrote:
>
>
> On Tue, Jun 23, 2026 at 6:30 PM Florents Tselai <florents(dot)tselai(at)gmail(dot)com>
> wrote:
>
>>
>>
>>
>> On Tue, Jun 23, 2026 at 6:16 PM Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
>> wrote:
>>
>>> On Fri, Jun 19, 2026 at 3:19 AM Florents Tselai <
>>> florents(dot)tselai(at)gmail(dot)com> wrote:
>>>
>>>>
>>>> On Tue, Jun 16, 2026 at 10:53 PM Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
>>>> wrote:
>>>>
>>>>> On Mon, Apr 13, 2026 at 5:57 AM Florents Tselai <
>>>>> florents(dot)tselai(at)gmail(dot)com> wrote:
>>>>>
>>>>>> Hello hackers,
>>>>>>
>>>>>> This is a follow-up to the work recently merged in bd4f879.
>>>>>> In hindsight, I regret not pushing these through for the previous
>>>>>> cycle,
>>>>>> as they represent the "missing pieces" for users trying to perform
>>>>>> data cleaning entirely within the JSONPath engine.
>>>>>> With these we can significantly reduce the need for users to "drop
>>>>>> out" of JSONPath
>>>>>> into standard SQL for basic string-to-string-or-array-and-back
>>>>>> workflows.
>>>>>>
>>>>>>
>>>>> Seems this one got overlooked, so I took a look. The first patch
>>>>> applies, but the second needs a rebase.
>>>>>
>>>>
>>>> Thanks for having a look Corey,
>>>>
>>>> here's a consolidated v2 .
>>>>
>>>
>>> My familiarity with the the internal json stuff is limited to my
>>> experience in reworking the I/O format of pg_ndistinct and pg_dependencies,
>>> plus some other misadventures in statistics export, so I apologize if some
>>> of these questions are naive.
>>>
>>> jsonpath.h:
>>>
>>> jpiStrJoin is added in the middle of the JsonPathItemType, but
>>> jpiStrSplit and jpiStrTranslate were added to the end of the enum. The
>>> comment specifically states that "the order must not be changed and new
>>> values must be added to the end" - is there some exception to this rule? If
>>> there is an exception should we modify the comment and have TAP test to
>>> show that it doesn't break pg_upgrade?
>>>
>>
>>
>>
>>>
>>> jsonpath.c:
>>>
>>> In jspInitBuffer(), you add 3 new cases, but they aren't in a row,
>>> aren't organized alphabetically, or any other guiding organization that I
>>> can derive.
>>>
>>> The order of new types in the Assert() in jspGetLeftArg() is different
>>> than the order in jspGetRightArg(). Is there a reason they're different?
>>>
>>> jsonpath_exec.c:
>>>
>>> Everything here checks out. There were some patterns that initially
>>> concerned me, but they all have precedent elsewhere in the file.
>>>
>>> jsonpath_gram.y and jsonpath_scan.l:
>>> LGTM
>>>
>>> jsonb_jsonpath.out/sql
>>>
>>> +-- Empty string delimiter (splits into individual characters)
>>> +select jsonb_path_query('"abc"', '$.split("")');
>>> + jsonb_path_query
>>> +------------------
>>> + ["abc"]
>>>
>>> Like Zsolt, I am confused by this test. The description does not match
>>> the result. Based on the description, I would expect ["a", "b", "c"], but
>>> the output here looks like it was from a '$.split(",")' or any delimiter
>>> that isn't one of the letters in the string.
>>>
>>
>> I have this planned, that comment is wrong and should read ;
>>
>> +-- Empty string delimiter (returns the original string as a single
>> element)
>>
>>
>>>
>>> the underlying function text_to_array is accessible via SQL
>>> string_to_array, and there we get:
>>>
>>> corey=# select string_to_array('foo', '');
>>> string_to_array
>>> -----------------
>>> {foo}
>>> (1 row)
>>>
>>> corey=# select string_to_array('foo', NULL);
>>> string_to_array
>>> -----------------
>>> {f,o,o}
>>> (1 row)
>>>
>>> Assuming we don't find that behavior to be a bug, we should change the
>>> description of the test, or change the second parameter from '' to NULL.
>>>
>>> Moving along, the first "silent => true" query could use a comment. In
>>> general a comment above every test is nice for telling future developers
>>> what specifically is being tested.
>>>
>>> func-json.sgml:
>>>
>>> The entry for split() makes no mention of the proper behavior when the
>>> delimiter is empty
>>>
>>> Summary:
>>>
>>> The things I'm noticing seem like very random coding choices, which
>>> makes me think that they weren't choices, they're just randomness
>>> introduced by squashing and rebasing on top of bd4f879a9cdd. Could you
>>> check into that?
>>>
>>
>> Thanks for your thoroughness.
>>
>> From a quick review I think most of your comments are spot on, and the
>> enum ordering especially should be considered a bit critical as it could
>> break backwards compatibility and pg_upgrade.
>>
>> I'll incorporate these in an upcoming v4.
>>
>
> Here's that v4 that addresses your points.
>
I think you may have posted the wrong file. The v4 patch only seems to
address my concern about the test case comment on split. Of particular
concern is the change to the enum in jsonpath.h, which will break binary
upgrades.
diff
~/Downloads/v4-0001-Add-more-jsonpath-string-methods-.translate-.spli.patch
~/Downloads/v3-0001-Add-more-jsonpath-string-methods-.translate-.spli.patch
1c1
< From 2a85e7b8833f0d683d988ceb5a4ddd87a4042c86 Mon Sep 17 00:00:00 2001
---
> From a3d0abf3f43d71128c0fef031b074438ca0c5394 Mon Sep 17 00:00:00 2001
3,4c3,4
< Date: Tue, 23 Jun 2026 12:51:26 +0300
< Subject: [PATCH v4] Add more jsonpath string methods (.translate, .split,
---
> Date: Sat, 20 Jun 2026 10:36:56 +0300
> Subject: [PATCH v3] Add more jsonpath string methods (.translate, .split,
610c610
< index 81efebc3d0f..26a8325f8f6 100644
---
> index 81efebc3d0f..5430fc083fa 100644
707c707
< +-- Empty string delimiter (returns the original string as a single
element)
---
> +-- Empty string delimiter (splits into individual characters)
974c974
< index c1f4ab5422e..3c4d775ee44 100644
---
> index c1f4ab5422e..a492e22f8b1 100644
1010c1010
< +-- Empty string delimiter (returns the original string as a single
element)
---
> +-- Empty string delimiter (splits into individual characters)
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Bharath Rupireddy | 2026-06-25 18:04:55 | Re: Handle concurrent drop when doing whole database vacuum |
| Previous Message | m.litsarev | 2026-06-25 17:56:04 | Re: pg_stat_statements: Remove (errcode...) framing parentheses in erport(...) |