Re: Question regarding UTF-8 data and "C" collation on definition of field of table

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dionisis Kontominas <dkontominas(at)gmail(dot)com>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Question regarding UTF-8 data and "C" collation on definition of field of table
Date: 2023-02-06 00:19:01
Message-ID: 2556580.1675642741@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Dionisis Kontominas <dkontominas(at)gmail(dot)com> writes:
> I suppose that affects the outcome of ORDER BY clauses on the field,
> along with the content of the indexes. Is this right?

Yeah.

> Assuming that the requirement exists, to store UTF-8 characters on a
> field that can be from multiple languages, and the database default
> encoding is UTF8 which is the right thing I suppose (please verify), what
> do you think should be the values of the Collation and Ctype for the
> database to behave correctly?

Um ... so define "correct". If you have a mishmash of languages in the
same column, it's likely that they have conflicting rules about sorting,
and there may be no ordering that's not surprising to somebody.

If there's a predominant language in the data, selecting a collation
matching that seems like your best bet. Otherwise, maybe you should
just shrug your shoulders and stick with C collation. It's likely
to be faster than any alternative.

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Dionisis Kontominas 2023-02-06 00:48:15 Re: Question regarding UTF-8 data and "C" collation on definition of field of table
Previous Message Dionisis Kontominas 2023-02-05 23:36:54 Re: Question regarding UTF-8 data and "C" collation on definition of field of table