Quick Links

Re: Proposal: tighten validation for legacy EUC encodings or document that accepted byte sequences may be unconvertible to UTF8

From:	Zhongpu Chen <chenloveit(at)gmail(dot)com>
To:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: Proposal: tighten validation for legacy EUC encodings or document that accepted byte sequences may be unconvertible to UTF8
Date:	2026-05-02 02:39:26
Message-ID:	CA+1gyq+LF_91g_i0WXeKK6JGF8viaqaF213S-9Arq=SG=4GAaA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

The issue is not specific to E'\\x..' literals. A normal COPY FROM data
file with ENCODING 'EUC_CN' can create text rows that later cannot be
retrieved with SELECT.

This suggests that input validation for EUC_CN is only structural, while
the EUC_CN-to-UTF8 conversion table is stricter.

On Sat, May 2, 2026 at 10:31 AM Zhongpu Chen <chenloveit(at)gmail(dot)com> wrote:

> See the related bug report
> https://www.postgresql.org/message-id/CA%2B1gyqL7uiQhfLcYWpHNUKQgHjQc7sOPthSTiaxLDZzcrGFYSg%40mail.gmail.com
>
> Currently PostgreSQL accepts structurally well-formed EUC_CN byte
> sequences such as 0xA2A3 into text columns. The value round-trips when
> client_encoding is EUC_CN, but fails when client_encoding is UTF8 because
> euc_cn_to_utf8 has no mapping.
>
> If this behavior is intentional for compatibility, the documentation
> should explicitly say that validation for some legacy encodings is
> byte-structure validation, not mapping-table validation.
> If it is not intentional, stricter validation could reject unassigned byte
> positions at input time.
>
> --
> Zhongpu Chen
>

--
Zhongpu Chen

In response to

Proposal: tighten validation for legacy EUC encodings or document that accepted byte sequences may be unconvertible to UTF8 at 2026-05-02 02:31:12 from Zhongpu Chen

Responses

Re: Proposal: tighten validation for legacy EUC encodings or document that accepted byte sequences may be unconvertible to UTF8 at 2026-05-02 03:28:31 from David G. Johnston

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	David G. Johnston	2026-05-02 03:28:31	Re: Proposal: tighten validation for legacy EUC encodings or document that accepted byte sequences may be unconvertible to UTF8
Previous Message	Zhongpu Chen	2026-05-02 02:31:12	Proposal: tighten validation for legacy EUC encodings or document that accepted byte sequences may be unconvertible to UTF8