Re: Errors in our encoding conversion tables

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tatsuo Ishii <ishii(at)postgresql(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Errors in our encoding conversion tables
Date: 2015-12-02 17:15:51
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Fri, Nov 27, 2015 at 8:54 PM, Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:
>> In short, there are number of reasons we cannot simply import the
>> consortium's mapping regarding SJIS (and EUC_JP).

> I haven't seen a response to this point, but it seems important.

I'll defer to Tatsuo-san concerning whether the Far Eastern conversions
should act the way they do. However, I still think the Cyrillic and
Latin-2 conversions are broken. There is no reason to question the
Unicode consortium's mappings in those cases AFAIK, and even if somebody
wants to, our current tables fail to round-trip some characters, which
is surely wrong. (See the "inconsistent reverse conversion" complaints
in the test output in <32464(dot)1448742262(at)sss(dot)pgh(dot)pa(dot)us>.)

Regardless of that, it's dismaying that we have files in our tree that
claim to produce our mapping tables from authoritative sources, when in
fact those tables were not produced in that way. This is a documentation
failure even if you consider the actual conversion behavior valid.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-12-02 17:30:32 Re: Logical replication and multimaster
Previous Message Robert Haas 2015-12-02 17:07:25 Re: Making the C collation less inclined to abort abbreviation