Re: UTF8 national character data type support WIP patch and list of open issues.

From: Valentine Gogichashvili <valgog(at)gmail(dot)com>
To: MauMau <maumau307(at)gmail(dot)com>, ishii(at)postgresql(dot)org
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Boguk, Maksym" <maksymb(at)fast(dot)au(dot)fujitsu(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: UTF8 national character data type support WIP patch and list of open issues.
Date: 2013-09-20 01:20:02
Message-ID: CAP93muULVtyd-HQd=h2VOWzaPUrf2Z9efqXDJvmV0Xx3Auj16Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

> That may be what's important to you, but it's not what's important to
>> me.
>>
>
> National character types support may be important to some potential users
> of PostgreSQL and the popularity of PostgreSQL, not me. That's why
> national character support is listed in the PostgreSQL TODO wiki. We might
> be losing potential users just because their selection criteria includes
> national character support.
>
>
the whole NCHAR appeared as hack for the systems, that did not have it from
the beginning. It would not be needed, if all the text would be magically
stored in UNICODE or UTF from the beginning and idea of character would be
the same as an idea of a rune and not a byte.

PostgreSQL has a very powerful possibilities for storing any kind of
encoding. So maybe it makes sense to add the ENCODING as another column
property, the same way a COLLATION was added?

It would make it possible to have a database, that talks to the clients in
UTF8 and stores text and varchar data in the encoding that is the most
appropriate for the situation.

It will make it impossible (or complicated) to make the database have a
non-UTF8 default encoding (I wonder who should need that in this case), as
conversions will not be possible from the broader charsets into the default
database encoding.

One could define an additional DATABASE property like LC_ENCODING that
would work for the ENCODING property of a column like LC_COLLATE for
COLLATE property of a column.

Text operations should work automatically, as in memory all strings will be
converted to the database encoding.

This approach will also open a possibility to implement custom ENCODINGs
for the column data storage, like snappy compression or even BSON, gobs or
protbufs for much more compact type storage.

Regards,

-- Valentine Gogichashvili

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2013-09-20 04:18:10 Re: [RFC] Extend namespace of valid guc names
Previous Message Robert Haas 2013-09-20 01:02:40 Re: Range types do not display in pg_stats