Re: Small patch to improve safety of utf8_to_unicode().

From: Ayush Tiwari <ayushtiwari(dot)slg01(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Small patch to improve safety of utf8_to_unicode().
Date: 2026-06-24 09:29:02
Message-ID: CAJTYsWUoYh84OugcnkghK24pLFsqxTy2Ajq7druCYVne_Dj8gw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Wed, 24 Jun 2026 at 11:15, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:

> On Mon, 2026-06-22 at 19:02 -0700, Jeff Davis wrote:
> > v4 attached.
>
> v5 attached.
>
> There's an extra patch 0002 to fix a logic bug when handling final
> sigma (only affects the builtin pg_unicode_fast locale), which I think
> should be backported to 18.
>
> Also added tests.
>

Thanks for the patch!

I took a look at the v5 series and tried it locally. The split between the
backpatchable defensive change, the final-sigma fix, and the newer
utf8encode/utf8decode API work for master makes sense to me.

On my machine, case-check fails before reaching the new invalid-UTF8
assertions. I think it's because PostgreSQL's Unicode tables are 17.0 while
my
system ICU is 15.1, so the exhaustive ICU comparison in test_icu() hits a
changed mapping and exits first.

Would it be worth running test_convert_case() before
test_icu()? It wouldn't make case-check pass on a mismatched-ICU system,
but it
would at least let the non-ICU conversion and the new invalid-UTF8 cases run
before the ICU comparison aborts.

Regards,
Ayush

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message shveta malik 2026-06-24 09:51:35 Re: Support EXCEPT for ALL SEQUENCES publications
Previous Message Akshay Joshi 2026-06-24 09:16:03 Re: [PATCH] Add pg_get_table_ddl() to reconstruct CREATE TABLE statements