Quick Links

Re: Character with byte sequence 0xa2 0xa3 in encoding "EUC_CN" has no equivalent in encoding "UTF8"

From:	Zhongpu Chen <chenloveit(at)gmail(dot)com>
To:	Junwang Zhao <zhjwpku(at)gmail(dot)com>
Cc:	pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: Character with byte sequence 0xa2 0xa3 in encoding "EUC_CN" has no equivalent in encoding "UTF8"
Date:	2026-05-01 16:09:19
Message-ID:	CA+1gyqJMtuTofZDy+CeomGGhsFGXw6JrdyAhqvnLii44oKePGg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

```
demo_euc_cn_db=# SET client_encoding TO 'EUC_CN';
SET
demo_euc_cn_db=# SELECT * FROM t WHERE id = 1;
id | s
----+----
1 | ��
(1 row)
```

Since 0xA2A3 is invalid in EUC-CN, it cannot be mapped to any meaningful
character. Currently, EUC-CN allows all 2-byte within A1-EF, but this
coarse-grained approach is flawed.

On Fri, May 1, 2026 at 11:07 PM Junwang Zhao <zhjwpku(at)gmail(dot)com> wrote:

> On Fri, May 1, 2026 at 9:59 PM Zhongpu Chen <chenloveit(at)gmail(dot)com> wrote:
> >
> > ## Description
> >
> > The legacy encodings allow some invalid bytes, which will cause errors
> during SELECT operations.
> >
> > ## How to reproduce
> >
> > ```shell
> > createdb -E EUC_CN -T template0 --locale=C demo_euc_cn_db
> > ```
> >
> > ```sql
> > demo_euc_cn_db=# CREATE TABLE t(id int, s varchar(10));
> >
> > demo_euc_cn_db=# INSERT INTO t VALUES(1, E'\xA2\xA3');
> > INSERT 0 1
> > demo_euc_cn_db=# SELECT * FROM t WHERE id = 1;
> > ERROR: character with byte sequence 0xa2 0xa3 in encoding "EUC_CN" has
> no equivalent in encoding "UTF8"
>
> Can you try the following statement before select?
> SET client_encoding TO 'EUC_CN';
>
> > ```
> >
> > --
> > Zhongpu Chen
>
>
>
> --
> Regards
> Junwang Zhao
>

--
Zhongpu Chen

In response to

Re: Character with byte sequence 0xa2 0xa3 in encoding "EUC_CN" has no equivalent in encoding "UTF8" at 2026-05-01 15:07:16 from Junwang Zhao

Responses

Re: Character with byte sequence 0xa2 0xa3 in encoding "EUC_CN" has no equivalent in encoding "UTF8" at 2026-05-02 01:25:20 from Junwang Zhao

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Andres Freund	2026-05-01 17:11:13	Re: [BUG] false positive in bt_index_check in case of short 4B varlena datum
Previous Message	Junwang Zhao	2026-05-01 15:07:16	Re: Character with byte sequence 0xa2 0xa3 in encoding "EUC_CN" has no equivalent in encoding "UTF8"