Re: encoding names v2.

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Karel Zak <zakkr(at)zf(dot)jcu(dot)cz>
Cc: pgsql-patches <pgsql-patches(at)postgreSQL(dot)org>
Subject: Re: encoding names v2.
Date: 2001-08-22 19:38:03
Message-ID: Pine.LNX.4.30.0108222124120.679-100000@peter.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches

Okay, here is some bad news: I just looked into the SQL99 standard for
the names of predefined character set names, and here is the list:

SQL_CHARACTER
GRAPHIC_IRV or ASCII_GRAPHIC
LATIN1 <==== !!!
ISO8BIT or ASCII_FULL
UTF16
UTF8
UCS2
SQL_TEXT
SQL_IDENTIFIER

So perhaps we should keep the LATIN1 thing after all? I don't like it,
but the rules...

Comments?

Karel Zak writes:

> - getdatabaseencoding() is compatible with old versions, but
> in the code is commented as deprecated.
>
> - getdbencoding() is new function that return correct encoding names

See my other message about this. I don't think this is a good choice of
names.

> - all encoding names use '-'. I hope we will never see a problem with
> it and some operator. Encoding names must be used as quoted string.

For SQL compliance we will need to access charset names as identifiers in
the future. So the name normalization should take effect whereever a
charset name is expected. I suppose this is what you did.

> Only for SQL_ASCII is used '_', because I see that JDBC has hardcoded
> "pg_encoding_to_char(1) = 'SQL_ASCII'" :-(((

This is okay, look at the list above for precedent.

> - the ./configure.in:
> * use new encoding names too for --enable-multibyte
> * define MULTIBYTE that handle default encoding id

Where is this needed?

> * define MULTIBYTE_NAME that handle default encoding name (neeful
> for initdb)

Can you rename this to something like DEFAULT_CHARACTER_SET? There is
really nothing "multibyte" here.

> - 'initdb' check if default template encoding is correct for backend DB.
>
> In the old code it's in initdb very hardcoded. I add to pg_encoding
> option '-b' that check if encoding is correct for backend DB (means
> encoding is not client only). It's better than
> if [ $MULTIBYTEID -gt 31 ]
> ^^^^^^
> in scripts.

Good.

> src/utils/mb/Unicode/KOI8_to_utf8.map --> src/utils/mb/Unicode/KOI8R_to_utf8.map
> src/utils/mb/Unicode/WIN_to_utf8.map --> src/utils/mb/Unicode/WIN1251_to_utf8.map
> src/utils/mb/Unicode/utf8_to_KOI8.map --> src/utils/mb/Unicode/utf8_to_KOI8R.map
> src/utils/mb/Unicode/utf8_to_WIN.map --> src/utils/mb/Unicode/utf8_to_WIN1251.map

Can you introduce some uniform capitalization (e.g., all lower case)?

> Thanks for all suggestion.
>
> New comments?

Don't worry, we'll get there. ;-)

--
Peter Eisentraut peter_e(at)gmx(dot)net http://funkturm.homeip.net/~peter

In response to

Responses

Browse pgsql-patches by date

  From Date Subject
Next Message Tatsuo Ishii 2001-08-23 00:54:53 Re: encoding names v2.
Previous Message Barry Lind 2001-08-22 17:08:02 Re: encoding names v2.