Re: Re: Re: [GENERAL] can postgresql supported utf8mb4 character sets?

From: Arjen Nienhuis <a(dot)g(dot)nienhuis(at)gmail(dot)com>
To: lsliang <lsliang(at)pconline(dot)com(dot)cn>
Cc: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Re: Re: [GENERAL] can postgresql supported utf8mb4 character sets?
Date: 2015-03-07 08:18:51
Message-ID: CAG6W84JPmOJP5F2eKtL-LvvVA63fuyGVV521hwQns7nk1_BLHw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, Mar 6, 2015 at 3:55 AM, lsliang <lsliang(at)pconline(dot)com(dot)cn> wrote:

>
>
>
>
> 2015-03-06
>
> ------------------------------
> *发件人:*Adrian Klaver
> *发送时间:*2015-03-05 21:31:39
> *收件人:*lsliang; pgsql-general
> *抄送:*
> *主题:*Re: [GENERAL] can postgresql supported utf8mb4 character sets?
>
> On 03/05/2015 01:45 AM, lsliang wrote:
> > can postgresql supported utf8mb4 character set?
> > today mobile apps support 4-byte character and utf8 can only
> > support 1-3 bytes character
> The docs would seem to indicate otherwise:
> http://www.postgresql.org/docs/9.3/interactive/multibyte.html
> http://en.wikipedia.org/wiki/UTF-8
> > if load string to database which contain a 4-byte character
> > will failed .
> Have you actually tried to load strings in to Postgres?
> If so and it failed what was the method you used and what was the error?
> > mysql since 5.5.3 support utf8mb4 character sets
> > I don't find some information about postgresql
> > thanks
> --
> Adrian Klaver
> adrian(dot)klaver(at)aklaver(dot)com
>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> thanks for your help .
>
> postgresql can support 4-byte character
>
> test=> select * from utf8mb4_test ;
>
> ERROR: character with byte sequence 0xf0 0x9f 0x98 0x84 in encoding "UTF8" has no equivalent in encoding "GB18030"
> test=> \encoding utf8
> test=> select * from utf8mb4_test ;
> content
> ---------
> 😄
> 😄
>
> pcauto=>
>
>
>

UTF-8 support works fine. The 3 byte limit was something mysql invented.
But it only works if your client encoding is UTF-8. In your example, your
terminal is not set to UTF-8.

create table test (glyph text);
insert into test values ('A'), ('馬'), ('𐁀'), ('😄'), ('🇪🇸');

select glyph, convert_to(glyph, 'utf-8'), length(glyph) FROM test;
glyph | convert_to | length
-------+--------------------+--------
A | \x41 | 1
馬 | \xe9a6ac | 1
𐁀 | \xf0908180 | 1
😄 | \xf09f9884 | 1
🇪🇸 | \xf09f87aaf09f87b8 | 2
(5 rows)

What doesn't work is GB18030:

select glyph, convert_to(glyph, 'GB18030'), length(glyph) FROM test;
ERROR: character with byte sequence 0xf0 0x90 0x81 0x80 in encoding "UTF8"
has no equivalent in encoding "GB18030"

I think that is a bug.

Gr. Arjen

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message wambacher 2015-03-07 12:44:03 Re: autovacuum worker running amok - and me too ;)
Previous Message dpopova 2015-03-07 05:43:40 Re: How to get plpython2 in /lib?