Re: Unicode vs SQL_ASCII DBs

From: "John Sidney-Woollett" <johnsw(at)wardbrook(dot)com>
To: "Kris Jurka" <books(at)ejurka(dot)com>
Cc: "John Sidney-Woollett" <johnsw(at)wardbrook(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Unicode vs SQL_ASCII DBs
Date: 2004-02-02 09:57:21
Message-ID: 1415.192.168.0.64.1075715841.squirrel@mercury.wardbrook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Kris, thanks for you feedback. Can you give me any further info on the
questions below?

Kris Jurka said:
>> 3) If I want accented characters to sort correctly, must I select
>> UNICODE
>> (or the appropriate ISO 8859 char set) over SQL_ASCII?
>
> You are confusing encoding with locale. Locales determines the correct
> sort order and you must choose an encoding that works with your locale.

Except that in my test, the two differently encoded databases were in the
same 7.4.1 cluster with the same locale, yet they sorted the *same* data
differently - implying the encoding is a factor.

Any idea why would that be?

here is output from pg_controldata:

pg_control version number: 72
Catalog version number: 200310211
Database cluster state: in production
pg_control last modified: Mon 02 Feb 2004 11:21:29 GMT
Current log file ID: 0
Next log file segment: 2
Latest checkpoint location: 0/124B958
Prior checkpoint location: 0/1149DFC
Latest checkpoint's REDO location: 0/124B958
Latest checkpoint's UNDO location: 0/0
Latest checkpoint's StartUpID: 16
Latest checkpoint's NextXID: 527327
Latest checkpoint's NextOID: 26472
Time of latest checkpoint: Mon 02 Feb 2004 11:21:27 GMT
Database block size: 8192
Blocks per segment of large relation: 131072
Maximum length of identifiers: 64
Maximum number of function arguments: 32
Date/time type storage: floating-point numbers
Maximum length of locale name: 128
LC_COLLATE: en_GB.UTF-8
LC_CTYPE: en_GB.UTF-8

and

Name | Owner | Encoding
---------------+----------+-----------
johntest | postgres | UNICODE
johntest2 | postgres | SQL_ASCII
template0 | postgres | SQL_ASCII
template1 | postgres | SQL_ASCII

> Other things to note:
>
> LOWER()/UPPER() only work correctly in a single byte encoding (not
> unicode)

Are there any other gotchas that I need to be aware of with a UNICODE
encoded database?

I save mention by Tom Lane of a bug: [QUOTE] The bug turns out not to be
Fedora-specific at all. I believe it will happen on any platform if you
are using both a multibyte database encoding (such as Unicode) *and* a
non-C locale. PG 7.4 has a more restricted form of the bug --- it's not
locale specific but does still require a multibyte encoding. [END QUOTE]

I basically need "english" sorting, and accented character support without
any JDBC access/conversion problems. Do you think that my current DB
locale (en_GB.UTF-8) and UNICODE encoded database the best solution? Or
can you suggest something better?

Thanks

John Sidney-Woollett

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Kris Jurka 2004-02-02 10:27:43 Re: Unicode vs SQL_ASCII DBs
Previous Message Bernd Helmle 2004-02-02 09:46:47 Re: Search across multiple sources