Re: UTF8 national character data type support WIP patch and list of open issues.

From: "MauMau" <maumau307(at)gmail(dot)com>
To: "Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Boguk, Maksym" <maksymb(at)fast(dot)au(dot)fujitsu(dot)com>, "Heikki Linnakangas" <hlinnakangas(at)vmware(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: UTF8 national character data type support WIP patch and list of open issues.
Date: 2013-09-19 22:42:19
Message-ID: 37B76474BB3149FD841373E12E355851@maumau
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: "Robert Haas" <robertmhaas(at)gmail(dot)com>
> That may be what's important to you, but it's not what's important to
> me.

National character types support may be important to some potential users of
PostgreSQL and the popularity of PostgreSQL, not me. That's why national
character support is listed in the PostgreSQL TODO wiki. We might be losing
potential users just because their selection criteria includes national
character support.

> I am not keen to introduce support for nchar and nvarchar as
> differently-named types with identical semantics.

Similar examples already exist:

- varchar and text: the only difference is the existence of explicit length
limit
- numeric and decimal
- int and int4, smallint and int2, bigint and int8
- real/double precison and float

In addition, the SQL standard itself admits:

"The <key word>s NATIONAL CHARACTER are used to specify the character type
with an implementation-
defined character set. Special syntax (N'string') is provided for
representing literals in that character set.
...
"NATIONAL CHARACTER" is equivalent to the corresponding <character string
type> with a specification
of "CHARACTER SET CSN", where "CSN" is an implementation-defined <character
set name>."

"A <national character string literal> is equivalent to a <character string
literal> with the "N" replaced by
"<introducer><character set specification>", where "<character set
specification>" is an implementation-
defined <character set name>."

> And I think it's an
> even worse idea to introduce them now, making them work one way, and
> then later change the behavior in a backward-incompatible fashion.

I understand your feeling. The concern about incompatibility can be
eliminated by thinking the following way. How about this?

- NCHAR can be used with any database encoding.

- At first, NCHAR is exactly the same as CHAR. That is,
"implementation-defined character set" described in the SQL standard is the
database character set.

- In the future, the character set for NCHAR can be selected at database
creation like Oracle's CREATE DATABAWSE .... NATIONAL CHARACTER SET
AL16UTF16. The default it the database set.

Could you tell me what kind of specification we should implement if we
officially support national character types?

Regards
MauMau

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-09-19 22:59:29 Re: [PERFORM] encouraging index-only scans
Previous Message Steve Singer 2013-09-19 22:31:38 Re: record identical operator - Review