| From: | Jeff Davis <pgsql(at)j-davis(dot)com> |
|---|---|
| To: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
| Cc: | pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: Small patch to improve safety of utf8_to_unicode(). |
| Date: | 2026-06-23 02:02:25 |
| Message-ID: | c82358d3b1bbd250a03937534bf76de9f4250d0c.camel@j-davis.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Fri, 2026-06-19 at 16:22 -0700, Jeff Davis wrote:
> On Wed, 2025-12-17 at 11:37 -0800, Jeff Davis wrote:
> > On Tue, 2025-12-16 at 07:34 +0800, Chao Li wrote:
> > > > <v2-0001-Make-utf8_to_unicode-safer.patch>
> > >
> > > V2 LGTM.
> >
> > On second thought, if we're going to change something here, we
> > should
> > probably have a more flexible API for both utf8_to_unicode() and
> > unicode_to_utf8().
v4 attached.
The main difference is that the first patch is more backportable. For
backbranches, I think the safest thing if we encounter invalid UTF8 is
to just terminate and return early. In master, we can change the API to
properly return the error upward.
Performance is not affected much, though in my brief tests it appeared
that 0002 lost a bit and then 0004 gained it back. But we gain full
UTF8 validation and safer UTF8 iterator APIs.
Regards,
Jeff Davis
| Attachment | Content-Type | Size |
|---|---|---|
| v4-0001-unicode_case.c-defend-against-invalid-UTF8.patch | text/x-patch | 4.6 KB |
| v4-0002-unicode_case.c-change-API-to-signal-UTF8-decoding.patch | text/x-patch | 11.5 KB |
| v4-0003-Validating-iterator-friendly-UTF8-encoder-decoder.patch | text/x-patch | 5.3 KB |
| v4-0004-unicode_case.c-use-new-utf8encode-utf8decode-APIs.patch | text/x-patch | 6.2 KB |
| From | Date | Subject | |
|---|---|---|---|
| Previous Message | Fujii Masao | 2026-06-23 01:39:42 | md5_password_warnings for password auth with MD5-encrypted passwords |