| From: | Jeff Davis <pgsql(at)j-davis(dot)com> |
|---|---|
| To: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
| Cc: | pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: Small patch to improve safety of utf8_to_unicode(). |
| Date: | 2025-12-15 20:23:44 |
| Message-ID: | f11b6c89612f89b76b758807a32a1d3769652d90.camel@j-davis.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Sun, 2025-12-14 at 07:22 +0800, Chao Li wrote:
> This patch adds a length check to utf8_to_unicode(), I think which is
> where “safety” comes from. Can you please add an a little bit more to
> the commit message instead of only saying “improve safety”.
Right: it does not read past pg_mblen(), nor past the supplied length.
We could go further and check for valid continuation bytes, but the
other routines don't do that.
> It also deleted two redundant function declarations from pg_wchar.h,
> which may also worth a quick note in the commit message.
I committed that as an independent change and removed it from this
patch.
> + /* Assume byte sequence has not been broken. */
> + c = utf8_to_unicode(str, MAX_MULTIBYTE_CHAR_LEN);
> ```
>
> Here we need an empty line above the new comment.
Done, and I expanded the comment to explain why it's safe.
> pg_utf_dsplen(const unsigned char *s)
> {
> - return ucs_wcwidth(utf8_to_unicode(s));
> + /* trust that input is not a truncated byte sequence */
> + return ucs_wcwidth(utf8_to_unicode(s,
> MAX_MULTIBYTE_CHAR_LEN));
> }
> ```
>
> For the new comment, as a code reader, I wonder why we “trust” that?
We could use strlen(), but I was concerned that it might be used for
string fragments that aren't NUL-terminated, because it's intended for
a single character. A caller might reasonably assume that it wouldn't
read past pg_mblen().
So I changed the comment slightly to just say that it requires the
input is a valid UTF-8 sequence. Let me know if you have another
suggestion.
Regards,
Jeff Davis
| Attachment | Content-Type | Size |
|---|---|---|
| v2-0001-Make-utf8_to_unicode-safer.patch | text/x-patch | 9.8 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Nathan Bossart | 2025-12-15 20:28:42 | Re: Proposal: Add a callback data parameter to GetNamedDSMSegment |
| Previous Message | Tom Lane | 2025-12-15 20:19:26 | Re: PRI?64 vs Visual Studio (2022) |