Quick Links

Re: Patch: add conversion from pg_wchar to multibyte

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Tatsuo Ishii <ishii(at)postgresql(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Patch: add conversion from pg_wchar to multibyte
Date:	2012-07-03 00:55:56
Message-ID:	6653.1341276956@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

I wrote:
> Some inspection of pg_wchar.h suggests that the IS_LCPRV1 and IS_LCPRV2
> cases are unused: the file doesn't define any encoding labels that match
> the byte values they accept, nor do the comments suggest that Emacs has
> any such labels either.

Scratch that --- I was misled by the fond illusion that our code
wouldn't use magic hex literals for encoding labels. Stuff like this:

/* 0x9d means LCPRV2 */
if (c1 == LC_CNS11643_1 || c1 == LC_CNS11643_2 || c1 == 0x9d)

seems to me to be well below the minimum acceptable quality standards
for Postgres code.

Having said that, grepping the src/backend/utils/mb/conversion_procs/
reveals no sign that 0x9a, 0x9b, or 0x9c are used anywhere with the
meanings that the IS_LCPRV1 and IS_LCPRV2 macros assign to them.
Furthermore, AFAICS the 0x9d case is only used in euc_tw_and_big5/,
with the following byte being one of the LC_CNS11643_[3-7] constants.

Given that these constants are treading on encoding ID namespace that
Emacs upstream might someday decide to assign, I think we'd be well
advised to *not* start installing any code that thinks that 9a-9c
mean something.

regards, tom lane

In response to

Re: Patch: add conversion from pg_wchar to multibyte at 2012-07-03 00:12:33 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Alvaro Herrera	2012-07-03 02:44:00	Re: File format for SSL CRL file
Previous Message	Greg Smith	2012-07-03 00:51:14	Oracle porting sample instr function