Re: Small patch to improve safety of utf8_to_unicode().

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Ayush Tiwari <ayushtiwari(dot)slg01(at)gmail(dot)com>
Cc: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Small patch to improve safety of utf8_to_unicode().
Date: 2026-06-24 23:43:28
Message-ID: e620dc508987e27dc542657d39da02f65bcaedcb.camel@j-davis.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 2026-06-24 at 14:59 +0530, Ayush Tiwari wrote:
> I took a look at the v5 series and tried it locally. The split
> between the
> backpatchable defensive change, the final-sigma fix, and the newer
> utf8encode/utf8decode API work for master makes sense to me. 

Thank you for taking a look.

> On my machine, case-check fails before reaching the new invalid-UTF8
> assertions. I think it's because PostgreSQL's Unicode tables are 17.0
> while my
> system ICU is 15.1, so the exhaustive ICU comparison in test_icu()
> hits a
> changed mapping and exits first.

Yes, that test is a bit awkward because most of it is a comparison to
results from ICU, so if ICU is unavailable or based on an older version
of Unicode, then the test doesn't work.

> Would it be worth running test_convert_case() before
> test_icu()?

If we want to make those independent of ICU, I think we'd move them to
a regular test suite that's exercised everywhere (not just as part of
the 'update-unicode' target). But if we did so, that would be a very
small test suite, because most of the results are already checked in
the normal SQL tests. What makes these tests different is that they are
exercising invalid UTF8 behavior, which we don't expect to happen
through ordinary SQL.

Regards,
Jeff Davis

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2026-06-25 00:51:44 DOCS - clarify CREATE SUBSCRIPTION only synchronizes sequences when copy_data=true
Previous Message Ben Mejia 2026-06-24 23:37:19 Re: Adjusting hash join memory limit to handle batch explosion