Re: Proposal - Support for National Characters functionality

From: "Arulappan, Arul Shaji" <arul(at)fast(dot)au(dot)fujitsu(dot)com>
To: Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal - Support for National Characters functionality
Date: 2013-07-30 07:03:11
Message-ID: 3AFB102B67FAEE48874E0607386DF4210DC329E0@SYDExchTmp.au.fjanz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> -----Original Message-----
> From: Tatsuo Ishii [mailto:ishii(at)postgresql(dot)org]
>
>
> Also I don't understand why you need UTF-16 support as a database
encoding
> because UTF-8 and UTF-16 are logically equivalent, they are just
different
> represention (encoding) of Unicode. That means if we already support
UTF-8
> (I'm sure we already do), there's no particular reason we need to add
UTF-16
> support.
>
> Maybe you just want to support UTF-16 as a client encoding?

Given below is a design draft for this functionality:

Core new functionality (new code):
1)Create and register independent NCHAR/NVARCHAR/NTEXT data types.

2)Provide support for the new GUC nchar_collation to provide the
database with information about the default collation that needs to be
used for the new data types.

3)Create encoding conversion subroutines to convert strings between the
database encoding and UTF8 (from national strings to regular strings and
back).
PostgreSQL already have all required support (used for conversion
between the database encoding and client_encoding), so amount of the new
code will be minimal there.

4)Because all symbols from non-UTF8 encodings could be represented as
UTF8 (but the reverse is not true) comparison between N* types and the
regular string types inside database will be performed in UTF8 form. To
achieve this feature the new IMPLICIT casts may need to be created:
NCHAR -> CHAR
NVARCHAR -> VARCHAR
NTEXT -> TEXT.

Casting in the reverse direction will be available too but only as
EXPLICIT.
However, these casts could fail if national strings could not be
represented in the used database encoding.

All these casts will use subroutines created in 3).

Casting/conversion between N* types will follow the same rules/mechanics
as used for casting/conversion between usual (CHAR(N)/VARCHAR(N)/TEXT)
string types.

5)Comparison between NATIONAL string values will be performed via
specialized UTF8 optimized functions (with respect of the
nchar_collation setting).

6)Client input/output of NATIONAL strings - NATIONAL strings will
respect the client_encoding setting, and their values will be
transparently converted to the requested client_encoding before
sending(receiving) to client (the same mechanics as used for usual
string types).
So no mixed encoding in client input/output will be supported/available.

7)Create set of the regression tests for these new data types.

Additional changes:
1)ECPG support for these new types
2) Support in the database drivers for the data types.

Rgds,
Arul Shaji

> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese: http://www.sraoss.co.jp

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gibheer 2013-07-30 07:10:01 Re: Patch for reserved connections for replication users
Previous Message Amit Kapila 2013-07-30 04:13:51 Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: Proposal for Allow postgresql.conf values to be changed via SQL [review])