Re: UTF8 national character data type support WIP patch and list of open issues.

From: "MauMau" <maumau307(at)gmail(dot)com>
To: "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Peter Eisentraut" <peter_e(at)gmx(dot)net>
Cc: "Arulappan, Arul Shaji" <arul(at)fast(dot)au(dot)fujitsu(dot)com>, "Greg Stark" <stark(at)mit(dot)edu>, "Tatsuo Ishii" <ishii(at)postgresql(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Boguk, Maksym" <Maksym(dot)Boguk(at)au(dot)fujitsu(dot)com>, "Heikki Linnakangas" <hlinnakangas(at)vmware(dot)com>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: UTF8 national character data type support WIP patch and list of open issues.
Date: 2013-11-09 07:24:55
Message-ID: 673E261C589440E3B0D8FDF9A11B1181@maumau
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: "Robert Haas" <robertmhaas(at)gmail(dot)com>
> On Tue, Nov 5, 2013 at 5:15 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
>> On 11/5/13, 1:04 AM, Arulappan, Arul Shaji wrote:
>>> Implements NCHAR/NVARCHAR as distinct data types, not as synonyms
>>
>> If, per SQL standard, NCHAR(x) is equivalent to CHAR(x) CHARACTER SET
>> "cs", then for some "cs", NCHAR(x) must be the same as CHAR(x).
>> Therefore, an implementation as separate data types is wrong.
>
> Since the point doesn't seem to be getting through, let me try to be
> more clear: we're not going to accept any form of this patch. A patch
> that makes some progress toward actually coping with multiple
> encodings in the same database would be very much worth considering,
> but adding compatible syntax with incompatible semantics is not of
> interest to the PostgreSQL project. We have had this debate on many
> other topics in the past and will no doubt have it again in the
> future, but the outcome is always the same.

It doesn't seem that there is any semantics incompatible with the SQL
standard as follows:

- In the first step, "cs" is the database encoding, which is used for
char/varchar/text.
- In the second (or final) step, where multiple encodings per database is
supported, "cs" is the national character encoding which is specified with
CREATE DATABASE ... NATIONAL CHARACTER ENCODING cs. If NATIONAL CHARACTER
ENCODING clause is omitted, "cs" is the database encoding as step 1.

Let me repeat myself: I think the biggest and immediate issue is that
PostgreSQL does not support national character types at least officially.
"Officially" means the description in the manual. So I don't have strong
objection against the current (hidden) implementation of nchar types in
PostgreSQL which are just synonyms, as long as the official support is
documented. Serious users don't want to depend on hidden features.

However, doesn't the current synonym approach have any problems? Wouldn't
it produce any trouble in the future? If we treat nchar as char, we lose
the fact that the user requested nchar. Can we lose the fact so easily and
produce irreversible result as below?

--------------------------------------------------
Maybe so. I guess the distinct type for NCHAR is for future extension and
user friendliness. As one user, I expect to get "national character"
instead of "char character set xxx" as output of psql \d and pg_dump when I
specified "national character" in DDL. In addition, that makes it easy to
use the pg_dump output for importing data to other DBMSs for some reason.
--------------------------------------------------

Regards
MauMau

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2013-11-09 07:30:14 Re: patch to fix unused variable warning on windows build
Previous Message Amit Khandekar 2013-11-09 06:39:41 information schema parameter_default implementation