Re: [PATCHES] encoding names

From: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To: zakkr(at)zf(dot)jcu(dot)cz
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] encoding names
Date: 2001-08-19 02:02:57
Message-ID: 20010819110257J.t-ishii@sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

> Hi,
>
> attached is patch with:
>
> - new encoding names stuff with better performance (binary search
> intead for() and prevent some needless searching)
>
> - possible is use synonyms for encoding (an example ISO-8859-1,
> Latin1, l1)
>
> - implemented is Peter's idea about "encoding names clearing"
> (other chars than [A-Za-z0-9] are irrelevan -- 'ISO-8859-1' is
> same as 'iso8859_1' or iso-8-8-5-9-1 :-)
>
> - share routines for this between FE and BE (never more define
> encoding names separate in FE and BE)
>
> - add prefix PG_ to encoding identificator macros, something like 'ALT'
> is pretty dirty in source code, rather use PG_ALT.
>
> (Note: patch add new file mb/encname.c and remove mb/common.c)
>
> Karel

Thanks for the patches, but...

1) There is a compiler error if --enable-unicode-conversion is not
enabled

2) The patches break createdb. createdb should raise an error if
client-only encodings such as SJIS etc. is specified.

3) I don't like following ugliness. Why not changing all of SQL_ASCII
occurrences in the sources.

/*
* A lot of PG stuff use 'SQL_ASCII' without prefix (dirty...)
*/
#define SQL_ASCII PG_SQL_ASCII

4) Encoding "official" names are inconsistent. Here are my suggested
changes (referring http://www.iana.org/assignments/character-sets,
according to Peter's suggestiuon):

ALT -> IBM866
KOI8 -> KOI8_R
UNICODE -> UTF_8 (Peter's suggestion)

Also, I'm wondering why windows-1251, not windows_1251? or
ISO_8859_1, not ISO-8859-1? there seems a confusion about the
usage of "_" and "-".

pg_enc2name pg_enc2name_tbl[] =
{
{ "SQL_ASCII", PG_SQL_ASCII },
{ "EUC_JP", PG_EUC_JP },
{ "EUC_CN", PG_EUC_CN },
{ "EUC_KR", PG_EUC_KR },
{ "EUC_TW", PG_EUC_TW },
{ "UNICODE", PG_UNICODE },
{ "MULE_INTERNAL",PG_MULE_INTERNAL },
{ "ISO_8859_1", PG_LATIN1 },
{ "ISO_8859_2", PG_LATIN2 },
{ "ISO_8859_3", PG_LATIN3 },
{ "ISO_8859_4", PG_LATIN4 },
{ "ISO_8859_5", PG_LATIN5 },
{ "KOI8", PG_KOI8 },
{ "window-1251",PG_WIN1251 },
{ "ALT", PG_ALT },
{ "Shift_JIS", PG_SJIS },
{ "Big5", PG_BIG5 },
{ "window-1250",PG_WIN1251 }
};

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Serguei Mokhov 2001-08-19 04:37:36 Re: Re: [PATCHES] encoding names
Previous Message Tatsuo Ishii 2001-08-19 02:02:49 Re: encoding names

Browse pgsql-patches by date

  From Date Subject
Next Message Serguei Mokhov 2001-08-19 04:37:36 Re: Re: [PATCHES] encoding names
Previous Message Tatsuo Ishii 2001-08-19 02:02:49 Re: encoding names