Quick Links

Re: UTF8 national character data type support WIP patch and list of open issues.

From:	Tatsuo Ishii <ishii(at)postgresql(dot)org>
To:	kleptog(at)svana(dot)org
Cc:	ishii(at)postgresql(dot)org, tgl(at)sss(dot)pgh(dot)pa(dot)us, maumau307(at)gmail(dot)com, laurenz(dot)albe(at)wien(dot)gv(dot)at, robertmhaas(at)gmail(dot)com, peter_e(at)gmx(dot)net, arul(at)fast(dot)au(dot)fujitsu(dot)com, stark(at)mit(dot)edu, Maksym(dot)Boguk(at)au(dot)fujitsu(dot)com, hlinnakangas(at)vmware(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: UTF8 national character data type support WIP patch and list of open issues.
Date:	2013-11-13 23:03:14
Message-ID:	20131114.080314.585382177189563942.t-ishii@sraoss.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

> Isn't this essentially what the MULE internal encoding is?

No. MULE is not powerfull enough and overly complicated to deal with
different encodings (character sets).

>> Currently there's no such an universal encoding in the universe, I
>> think the only way is, inventing it by ourselves.
>
> This sounds like a terrible idea. In the future people are only going
> to want more advanced text functions, regular expressions, indexing and
> making encodings that don't exist anywhere else seems like a way to
> make a lot of work for little benefit.

That is probably a misunderstanding. We don't need to modify existing
text handling modules such as text functions, regular expressions,
indexing etc. We just convert from the "universal" encoding X to the
original encoding before calling them. The process is pretty easy and
fast because it just requires skipping "encoding identifier" and
"encoding length" part.

Basically the encoding X should be used for lower layer modules of
PostgreSQL and higher layer module such as living in
src/backend/utils/adt should not aware it.

> A better idea seems to me is to (if postgres is configured properly)
> embed the non-round-trippable characters in the custom character part
> of the unicode character set. In other words, adjust the mappings
> tables on demand and voila.

Using Unicode requires overhead for encoding conversion because it
needs to look up mapping tables. That will be a huge handicap for
large data and that I want to avoid in the first place.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

In response to

Re: UTF8 national character data type support WIP patch and list of open issues. at 2013-11-13 20:19:50 from Martijn van Oosterhout

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Merlin Moncure	2013-11-13 23:09:05	Re: additional json functionality
Previous Message	Gavin Flower	2013-11-13 22:41:19	Re: additional json functionality