Re: UTF8 national character data type support WIP patch and list of open issues.

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: MauMau <maumau307(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Boguk, Maksym" <maksymb(at)fast(dot)au(dot)fujitsu(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: UTF8 national character data type support WIP patch and list of open issues.
Date: 2013-09-18 13:16:11
Message-ID: CA+TgmobWv_96dSfSHv2ZJWSXAD=QTiBEKw-LT359sGArYe+AXQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 16, 2013 at 8:49 AM, MauMau <maumau307(at)gmail(dot)com> wrote:
> 2. NCHAR/NVARCHAR columns can be used in non-UTF-8 databases and always
> contain Unicode data.
...
> 3. Store strings in UTF-16 encoding in NCHAR/NVARCHAR columns.
> Fixed-width encoding may allow faster string manipulation as described in
> Oracle's manual. But I'm not sure about this, because UTF-16 is not a real
> fixed-width encoding due to supplementary characters.

It seems to me that these two points here are the real core of your
proposal. The rest is just syntactic sugar.

Let me start with the second one: I don't think there's likely to be
any benefit in using UTF-16 as the internal encoding. In fact, I
think it's likely to make things quite a bit more complicated, because
we have a lot of code that assumes that server encodings have certain
properties that UTF-16 doesn't - specifically, that any byte with the
high-bit clear represents the corresponding ASCII character.

As to the first one, if we're going to go to the (substantial) trouble
of building infrastructure to allow a database to store data in
multiple encodings, why limit it to storing UTF-8 in non-UTF-8
databases? What about storing SHIFT-JIS in UTF-8 databases, or
Windows-yourfavoriteM$codepagehere in UTF-8 databases, or any other
combination you might care to name?

Whether we go that way or not, I think storing data in one encoding in
a database with a different encoding is going to be pretty tricky and
require far-reaching changes. You haven't mentioned any of those
issues or discussed how you would solve them.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-09-18 13:19:27 Re: psql should show disabled internal triggers
Previous Message Bernd Helmle 2013-09-18 13:15:55 Re: psql should show disabled internal triggers