| From: | Jeff Davis <pgsql(at)j-davis(dot)com> |
|---|---|
| To: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
| Cc: | pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: Small patch to improve safety of utf8_to_unicode(). |
| Date: | 2026-06-19 23:22:08 |
| Message-ID: | fbcb039a9ab3ba834f34174915254732fdcfae86.camel@j-davis.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Wed, 2025-12-17 at 11:37 -0800, Jeff Davis wrote:
> On Tue, 2025-12-16 at 07:34 +0800, Chao Li wrote:
> > > <v2-0001-Make-utf8_to_unicode-safer.patch>
> >
> > V2 LGTM.
>
> On second thought, if we're going to change something here, we should
> probably have a more flexible API for both utf8_to_unicode() and
> unicode_to_utf8().
New series:
0001: validates UTF8 before calling into unicode_case.c. Extra defense,
and simple to backport, but regresses performance of those functions.
It also might risk errors if somehow there is invalid UTF8.
0002: refactors to create an error path from unicode_case.c into
pg_locale_builtin.c, where a proper error can be thrown. This wins back
the performance lost in the previous commit. This is perhaps
backportable, but technically it changes an exported function
signature, so carries some very low risk.
0003: Adds utf8encode() and utf8decode(), which are iteration-friendly
and inlinable, and fully-validate UTF8 (e.g. rejects surrogate halves).
This is an enhancement so should not be backported.
0004: Make use of new API from unicode_case.c.
Regards,
Jeff Davis
| Attachment | Content-Type | Size |
|---|---|---|
| v3-0001-unicode_case.c-ensure-valid-UTF8.patch | text/x-patch | 1.8 KB |
| v3-0002-Move-UTF8-checks-into-unicode_case.c.patch | text/x-patch | 15.0 KB |
| v3-0003-Validating-iterator-friendly-UTF8-encoder-decoder.patch | text/x-patch | 5.3 KB |
| v3-0004-unicode_case.c-use-new-utf8encode-utf8decode-APIs.patch | text/x-patch | 6.4 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Richard Guo | 2026-06-20 02:21:10 | Improve UNION's output rowcount estimate |
| Previous Message | Masahiko Sawada | 2026-06-19 22:33:21 | Add a hook for handling logical decoding messages on subscribers. |