Quick Links

Re: Unicode + LC_COLLATE

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	johnsw(at)wardbrook(dot)com
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Unicode + LC_COLLATE
Date:	2004-04-22 13:31:04
Message-ID:	25605.1082640664@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

"John Sidney-Woollett" <johnsw(at)wardbrook(dot)com> writes:
> Does anyone know what the effect of --lc-collate=C --encoding=UNICODE will
> be for sorts (and indexes?) when a multibyte unicode character is
> encountered?

C locale basically means "sort by the byte sequence values". It'll do
something self-consistent, but maybe not what you'd like for UTF8
characters.

> Our database is UNICODE with LC_COLLATE=en_US.iso885915.

Does that sort rationally at all? I should think you'd need to specify
an LC_COLLATE setting that's designed for UTF8 encoding, not 8859-15.

If you only ever store characters that are in 7-bit ASCII then none of
this will affect you, and you can get away with broken combinations of
encoding and locale. But if you'd like to sort characters outside the
minimal ASCII set then you need to get it right ...

regards, tom lane

In response to

Unicode + LC_COLLATE at 2004-04-22 11:17:17 from John Sidney-Woollett

Responses

Re: Unicode + LC_COLLATE at 2004-04-22 13:26:58 from John Sidney-Woollett

Browse pgsql-general by date

	From	Date	Subject
Next Message	Oleg Bartunov	2004-04-22 13:33:55	Re: [GENERAL] Restoring a Databases that features tserach2
Previous Message	John Sidney-Woollett	2004-04-22 13:26:58	Re: Unicode + LC_COLLATE