Quick Links

Re: 8.3 can't convert cyrillic text from 'iso-8859-5' to other cyrillic 8-bit encoding

From:	"Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To:	"Sergey Burladyan" <eshkinkot(at)gmail(dot)com>
Cc:	<pgsql-bugs(at)postgresql(dot)org>
Subject:	Re: 8.3 can't convert cyrillic text from 'iso-8859-5' to other cyrillic 8-bit encoding
Date:	2008-03-20 11:18:44
Message-ID:	47E24814.7060501@enterprisedb.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

Sergey Burladyan wrote:
> Thursday 20 March 2008 01:16:34 Heikki Linnakangas:
>> Here's a patch that does the conversion in the other direction as well.
>> As I'm not too familiar with cyrillic, can you double-check that this
>> works? I tested it using the convert() function between different
>> encodings, and it seems ok to me.
>
> yes, i test it with function like this and it work now :)

Ok, patch applied.

>> Hmm. We use KOI8-R (or rather, MULE_INTERNAL with KOI8-R ) as an
>> intermediate encoding, because there's no direct conversion table
>> between ISO-8859-5 and the other cyrillic encodings. Ideally there would
>> be. Another possibility would be to use UTF-8 as the intermediate
>> encoding; that'd probably be much slower, but UTF-8 should have all the
>> characters needed.
> I think that UTF-8 is too complex for translate 8-bit charset to another 8-bit
> charset, but other solution is many many translate tables... hard question %)

Yeah. It's probably not worth the effort to change/test it. Apparently
there's not many people using these conversion functions, as the bug has
been there since 7.3 and you're the first one to notice.

>> Is there any other characters like "YO" that are missing, that exist in
>> all the encodings?
> if we say about alphabet letters, the answer is - No, only "YO" was missing.
> if we say about any character, there is 'NO-BREAK SPACE' (U+00A0) it exist in
> 1251, 866, koi8-r and iso but i do not think that it widely used...

Ok, good.

Thanks for the report and the patch!

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Re: 8.3 can't convert cyrillic text from 'iso-8859-5' to other cyrillic 8-bit encoding at 2008-03-20 03:33:03 from Sergey Burladyan

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Rui Martins	2008-03-20 11:57:14	Re: BUG #4044: Incorrect RegExp substring Output
Previous Message	NikhilS	2008-03-20 06:49:49	Re: Problem identifying constraints which should not be inherited