Skip site navigation (1) Skip section navigation (2)

Re: 8.3 can't convert cyrillic text from 'iso-8859-5' to other cyrillic 8-bit encoding

From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To: "Sergey Burladyan" <eshkinkot(at)gmail(dot)com>
Cc: <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: 8.3 can't convert cyrillic text from 'iso-8859-5' to other cyrillic 8-bit encoding
Date: 2008-03-19 10:21:41
Message-ID: 47E0E935.1020801@enterprisedb.com (view raw or flat)
Thread:
Lists: pgsql-bugs
Sergey Burladyan wrote:
> src/backend/utils/mb/conversion_procs/cyrillic_and_mic/cyrillic_and_mic.c
> does not have cyrillic letter 'IO' in ISO-8859-5 to mule internal code 
> translation table (function iso2mic(const unsigned char *l, unsigned char *p, 
> int len)). this is bug, because it is widely used and it is main letter like 
> A, B or C in english :) and it is exist in all russian cyrillic's encoding 
> (koi8-r, iso-8859-5, windows-1251, cp866).
> for example, in russian, words 'all', 'hedgehog', 'Christmas-tree' and many 
> other must be written with it.
> 
> here is the patch for add it to ISO-8859-5 to mule internal code translation 
> table. i am don't know is this ok and do not brake any internal rule or 
> code ?

You'd need to modify the mic->ISO-8859-5 translation table as well, for 
converting in the other direction.

> By the way, as i can understand you are using koi8-r encoding for internal 
> representation of cyrillic charsets - this is have also another problem. the 
> second "widely" used char is <U2116> NUMERO SIGN (many accountants and 
> managers use it :) in cyrillic windows world) and it is exist in 
> windows-1251, cp866 and iso-8859-5 encoding, but not in koi8-r...

Hmm. We use KOI8-R (or rather, MULE_INTERNAL with KOI8-R ) as an 
intermediate encoding, because there's no direct conversion table 
between ISO-8859-5 and the other cyrillic encodings. Ideally there would 
be. Another possibility would be to use UTF-8 as the intermediate 
encoding; that'd probably be much slower, but UTF-8 should have all the 
characters needed.

Is there any other characters like "YO" that are missing, that exist in 
all the encodings? Looking at the character set table for KOI8-R, it 
looks like the "YO" is in an odd place in the table, compared to all 
other cyrillic characters. Perhaps that's why it was missed.

-- 
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

In response to

Responses

pgsql-bugs by date

Next:From: Rui MartinsDate: 2008-03-19 11:09:15
Subject: Re: BUG #4044: Incorrect RegExp substring Output
Previous:From: NikhilSDate: 2008-03-19 07:51:48
Subject: Re: Problem identifying constraints which should not be inherited

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group