| From: | Florents Tselai <florents(dot)tselai(at)gmail(dot)com> |
|---|---|
| To: | Corey Huinker <corey(dot)huinker(at)gmail(dot)com> |
| Cc: | pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: More jsonpath methods: translate, split, join |
| Date: | 2026-06-26 18:28:42 |
| Message-ID: | CA+v5N40qpJAcCV4QniQnHLP_gqZD5oK30m2jFsdxW07z+bTHDg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Thu, Jun 25, 2026 at 9:00 PM Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
wrote:
> On Wed, Jun 24, 2026 at 1:17 PM Florents Tselai <florents(dot)tselai(at)gmail(dot)com>
> wrote:
>
>>
>>
>> On Tue, Jun 23, 2026 at 6:30 PM Florents Tselai <
>> florents(dot)tselai(at)gmail(dot)com> wrote:
>>
>>>
>>>
>>>
>>> On Tue, Jun 23, 2026 at 6:16 PM Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
>>> wrote:
>>>
>>>> On Fri, Jun 19, 2026 at 3:19 AM Florents Tselai <
>>>> florents(dot)tselai(at)gmail(dot)com> wrote:
>>>>
>>>>>
>>>>> On Tue, Jun 16, 2026 at 10:53 PM Corey Huinker <
>>>>> corey(dot)huinker(at)gmail(dot)com> wrote:
>>>>>
>>>>>> On Mon, Apr 13, 2026 at 5:57 AM Florents Tselai <
>>>>>> florents(dot)tselai(at)gmail(dot)com> wrote:
>>>>>>
>>>>>>> Hello hackers,
>>>>>>>
>>>>>>> This is a follow-up to the work recently merged in bd4f879.
>>>>>>> In hindsight, I regret not pushing these through for the previous
>>>>>>> cycle,
>>>>>>> as they represent the "missing pieces" for users trying to perform
>>>>>>> data cleaning entirely within the JSONPath engine.
>>>>>>> With these we can significantly reduce the need for users to "drop
>>>>>>> out" of JSONPath
>>>>>>> into standard SQL for basic string-to-string-or-array-and-back
>>>>>>> workflows.
>>>>>>>
>>>>>>>
>>>>>> Seems this one got overlooked, so I took a look. The first patch
>>>>>> applies, but the second needs a rebase.
>>>>>>
>>>>>
>>>>> Thanks for having a look Corey,
>>>>>
>>>>> here's a consolidated v2 .
>>>>>
>>>>
>>>> My familiarity with the the internal json stuff is limited to my
>>>> experience in reworking the I/O format of pg_ndistinct and pg_dependencies,
>>>> plus some other misadventures in statistics export, so I apologize if some
>>>> of these questions are naive.
>>>>
>>>> jsonpath.h:
>>>>
>>>> jpiStrJoin is added in the middle of the JsonPathItemType, but
>>>> jpiStrSplit and jpiStrTranslate were added to the end of the enum. The
>>>> comment specifically states that "the order must not be changed and new
>>>> values must be added to the end" - is there some exception to this rule? If
>>>> there is an exception should we modify the comment and have TAP test to
>>>> show that it doesn't break pg_upgrade?
>>>>
>>>
>>>
>>>
>>>>
>>>> jsonpath.c:
>>>>
>>>> In jspInitBuffer(), you add 3 new cases, but they aren't in a row,
>>>> aren't organized alphabetically, or any other guiding organization that I
>>>> can derive.
>>>>
>>>> The order of new types in the Assert() in jspGetLeftArg() is different
>>>> than the order in jspGetRightArg(). Is there a reason they're different?
>>>>
>>>> jsonpath_exec.c:
>>>>
>>>> Everything here checks out. There were some patterns that initially
>>>> concerned me, but they all have precedent elsewhere in the file.
>>>>
>>>> jsonpath_gram.y and jsonpath_scan.l:
>>>> LGTM
>>>>
>>>> jsonb_jsonpath.out/sql
>>>>
>>>> +-- Empty string delimiter (splits into individual characters)
>>>> +select jsonb_path_query('"abc"', '$.split("")');
>>>> + jsonb_path_query
>>>> +------------------
>>>> + ["abc"]
>>>>
>>>> Like Zsolt, I am confused by this test. The description does not match
>>>> the result. Based on the description, I would expect ["a", "b", "c"], but
>>>> the output here looks like it was from a '$.split(",")' or any delimiter
>>>> that isn't one of the letters in the string.
>>>>
>>>
>>> I have this planned, that comment is wrong and should read ;
>>>
>>> +-- Empty string delimiter (returns the original string as a single
>>> element)
>>>
>>>
>>>>
>>>> the underlying function text_to_array is accessible via SQL
>>>> string_to_array, and there we get:
>>>>
>>>> corey=# select string_to_array('foo', '');
>>>> string_to_array
>>>> -----------------
>>>> {foo}
>>>> (1 row)
>>>>
>>>> corey=# select string_to_array('foo', NULL);
>>>> string_to_array
>>>> -----------------
>>>> {f,o,o}
>>>> (1 row)
>>>>
>>>> Assuming we don't find that behavior to be a bug, we should change the
>>>> description of the test, or change the second parameter from '' to NULL.
>>>>
>>>> Moving along, the first "silent => true" query could use a comment. In
>>>> general a comment above every test is nice for telling future developers
>>>> what specifically is being tested.
>>>>
>>>> func-json.sgml:
>>>>
>>>> The entry for split() makes no mention of the proper behavior when the
>>>> delimiter is empty
>>>>
>>>> Summary:
>>>>
>>>> The things I'm noticing seem like very random coding choices, which
>>>> makes me think that they weren't choices, they're just randomness
>>>> introduced by squashing and rebasing on top of bd4f879a9cdd. Could you
>>>> check into that?
>>>>
>>>
>>> Thanks for your thoroughness.
>>>
>>> From a quick review I think most of your comments are spot on, and the
>>> enum ordering especially should be considered a bit critical as it could
>>> break backwards compatibility and pg_upgrade.
>>>
>>> I'll incorporate these in an upcoming v4.
>>>
>>
>> Here's that v4 that addresses your points.
>>
>
> I think you may have posted the wrong file. The v4 patch only seems to
> address my concern about the test case comment on split. Of particular
> concern is the change to the enum in jsonpath.h, which will break binary
> upgrades.
>
Argh, you're right . Apologies for the noise.
| Attachment | Content-Type | Size |
|---|---|---|
| v4-0001-Add-more-jsonpath-string-methods-.translate-.spli.patch | application/octet-stream | 40.2 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Dean Rasheed | 2026-06-26 19:37:07 | Re: Global temporary tables |
| Previous Message | Nathan Bossart | 2026-06-26 18:26:20 | Re: doc: fix pg_stat_autovacuum_scores threshold wording |