Re: Small patch to improve safety of utf8_to_unicode().

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Small patch to improve safety of utf8_to_unicode().
Date: 2026-06-23 02:02:25
Message-ID: c82358d3b1bbd250a03937534bf76de9f4250d0c.camel@j-davis.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 2026-06-19 at 16:22 -0700, Jeff Davis wrote:
> On Wed, 2025-12-17 at 11:37 -0800, Jeff Davis wrote:
> > On Tue, 2025-12-16 at 07:34 +0800, Chao Li wrote:
> > > > <v2-0001-Make-utf8_to_unicode-safer.patch>
> > >
> > > V2 LGTM.
> >
> > On second thought, if we're going to change something here, we
> > should
> > probably have a more flexible API for both utf8_to_unicode() and
> > unicode_to_utf8().

v4 attached.

The main difference is that the first patch is more backportable. For
backbranches, I think the safest thing if we encounter invalid UTF8 is
to just terminate and return early. In master, we can change the API to
properly return the error upward.

Performance is not affected much, though in my brief tests it appeared
that 0002 lost a bit and then 0004 gained it back. But we gain full
UTF8 validation and safer UTF8 iterator APIs.

Regards,
Jeff Davis

Attachment Content-Type Size
v4-0001-unicode_case.c-defend-against-invalid-UTF8.patch text/x-patch 4.6 KB
v4-0002-unicode_case.c-change-API-to-signal-UTF8-decoding.patch text/x-patch 11.5 KB
v4-0003-Validating-iterator-friendly-UTF8-encoder-decoder.patch text/x-patch 5.3 KB
v4-0004-unicode_case.c-use-new-utf8encode-utf8decode-APIs.patch text/x-patch 6.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Previous Message Fujii Masao 2026-06-23 01:39:42 md5_password_warnings for password auth with MD5-encrypted passwords