Re: Tighten pg_uhc_verifychar() to enforce CP949 lead/trail byte ranges

From: dogeon yoo <ehrjs023(at)gmail(dot)com>
To: assam258(at)gmail(dot)com
Cc: ishii(at)postgresql(dot)org, thomas(dot)munro(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Tighten pg_uhc_verifychar() to enforce CP949 lead/trail byte ranges
Date: 2026-07-02 00:10:17
Message-ID: CAFVBZ_EXEVgzw+EL-x7XK=N-XzeEfz7MRO0HCBsxLff=nE=Rkg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 1, 2026 at 4:49 PM Henson Choi <assam258(at)gmail(dot)com> wrote:
> I checked this exhaustively rather than by spot check. Scanning the
> full two-byte space against PostgreSQL's own uhc_to_utf8.map (17,237
> mapped sequences), the tightened accept set (lead 0x81-0xFE; trail
> 0x41-0x5A, 0x61-0x7A, 0x81-0xFE) is a strict superset of every
> mapped sequence -- zero real mappings fall in the newly-rejected
> ranges.

Thanks for the exhaustive review -- that is exactly the check that
matters here. I reproduced it on live builds as well: scanning the
full two-byte space through convert_from() on both the old and the
new verifier gives the same 17,237 decodable sequences with
byte-identical outputs, and none of them falls outside the
tightened ranges.

> Two optional test cases would close the last coverage gaps in
> uhc.sql (neither blocks commit):

Folded both into 0001:

- accept, upper lead boundary:

SELECT encode(convert_to(convert_from('\xfea1', 'UHC'),
'UTF8'), 'hex');
-> ee819e

0xFE now appears as a lead byte, so the lead upper bound is
exercised directly rather than only as a trail byte.

- reject, NUL trail:

SELECT convert_from('\x8100', 'UHC');
-> ERROR: invalid byte sequence for encoding "UHC": 0x81 0x00

the one trail byte the pre-patch verifier already rejected.

Both produce identical output before and after 0002, so they sit in
the baseline (0001), and 0002's expected diff is still exactly the
eight message-format changes. Full regression passes.

v2 attached.

Regards,
DoGeon Yoo

Attachment Content-Type Size
v2-0001-Add-regression-test-for-UHC-encoding-baseline-capture.patch application/octet-stream 9.0 KB
v2-0002-Tighten-pg_uhc_verifychar-to-enforce-CP949-lead-trail-byt.patch application/octet-stream 5.4 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Henson Choi 2026-07-02 00:23:28 Re: Row pattern recognition
Previous Message Haibo Yan 2026-07-02 00:00:58 Re: implement CAST(expr AS type FORMAT 'template')