| From: | dogeon yoo <ehrjs023(at)gmail(dot)com> |
|---|---|
| To: | assam258(at)gmail(dot)com |
| Cc: | ishii(at)postgresql(dot)org, thomas(dot)munro(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: Tighten pg_uhc_verifychar() to enforce CP949 lead/trail byte ranges |
| Date: | 2026-07-02 00:10:17 |
| Message-ID: | CAFVBZ_EXEVgzw+EL-x7XK=N-XzeEfz7MRO0HCBsxLff=nE=Rkg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Wed, Jul 1, 2026 at 4:49 PM Henson Choi <assam258(at)gmail(dot)com> wrote:
> I checked this exhaustively rather than by spot check. Scanning the
> full two-byte space against PostgreSQL's own uhc_to_utf8.map (17,237
> mapped sequences), the tightened accept set (lead 0x81-0xFE; trail
> 0x41-0x5A, 0x61-0x7A, 0x81-0xFE) is a strict superset of every
> mapped sequence -- zero real mappings fall in the newly-rejected
> ranges.
Thanks for the exhaustive review -- that is exactly the check that
matters here. I reproduced it on live builds as well: scanning the
full two-byte space through convert_from() on both the old and the
new verifier gives the same 17,237 decodable sequences with
byte-identical outputs, and none of them falls outside the
tightened ranges.
> Two optional test cases would close the last coverage gaps in
> uhc.sql (neither blocks commit):
Folded both into 0001:
- accept, upper lead boundary:
SELECT encode(convert_to(convert_from('\xfea1', 'UHC'),
'UTF8'), 'hex');
-> ee819e
0xFE now appears as a lead byte, so the lead upper bound is
exercised directly rather than only as a trail byte.
- reject, NUL trail:
SELECT convert_from('\x8100', 'UHC');
-> ERROR: invalid byte sequence for encoding "UHC": 0x81 0x00
the one trail byte the pre-patch verifier already rejected.
Both produce identical output before and after 0002, so they sit in
the baseline (0001), and 0002's expected diff is still exactly the
eight message-format changes. Full regression passes.
v2 attached.
Regards,
DoGeon Yoo
| Attachment | Content-Type | Size |
|---|---|---|
| v2-0001-Add-regression-test-for-UHC-encoding-baseline-capture.patch | application/octet-stream | 9.0 KB |
| v2-0002-Tighten-pg_uhc_verifychar-to-enforce-CP949-lead-trail-byt.patch | application/octet-stream | 5.4 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Henson Choi | 2026-07-02 00:23:28 | Re: Row pattern recognition |
| Previous Message | Haibo Yan | 2026-07-02 00:00:58 | Re: implement CAST(expr AS type FORMAT 'template') |