Re: Errors in our encoding conversion tables

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Cc: Tatsuo Ishii <ishii(at)postgreSQL(dot)org>
Subject: Re: Errors in our encoding conversion tables
Date: 2015-11-28 20:24:22
Message-ID: 32464.1448742262@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> There's a discussion over at
> http://www.postgresql.org/message-id/flat/2sa(dot)Dhu5(dot)1hk1yrpTNFy(dot)1MLOlb(at)seznam(dot)cz
> of an apparent error in our WIN1250 -> LATIN2 conversion.

Attached is an updated patch (against today's HEAD) showing proposed
changes to bring cyrillic_and_mic.c and latin2_and_win1250.c into sync
with the Unicode Consortium's conversion data.

In addition, I've attached the C program I used to generate the proposed
new conversion tables from the Unicode/*.map files, a simple SQL script
to print out the conversion behavior for the affected conversions, and
a diff of the script's output between 9.5 and the proposed patch.

While the changes in the WIN1250 <-> LATIN2 conversions just amount to
removal of some translations that seem to have no basis in reality, the
changes in the Cyrillic mappings are quite a bit more extensive. It would
be good if we could get those checked by some native Russian speakers.

regards, tom lane

Attachment Content-Type Size
encoding-conversion-corrections-2.patch text/x-diff 16.4 KB
buildmap.c text/x-c 3.2 KB
checkconv.sql text/plain 2.8 KB
diffs9.5vspatch text/x-diff 59.7 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2015-11-28 20:51:58 Re: Freeze avoidance of very large table.
Previous Message Jeff Janes 2015-11-28 20:17:25 Re: Speed up Clog Access by increasing CLOG buffers