Re: [HACKERS] Can ICU be used for a database's default sort order?

From: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
To: "Peter Eisentraut" <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: "Andrey Borodin" <x4mmm(at)yandex-team(dot)ru>,"Dmitry Dolgov" <9erthalion6(at)gmail(dot)com>,"Michael Paquier" <michael(at)paquier(dot)xyz>,"Thomas Munro" <thomas(dot)munro(at)enterprisedb(dot)com>,pg(at)bowt(dot)ie,"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>,"PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Can ICU be used for a database's default sort order?
Date: 2018-12-12 14:57:50
Message-ID: a7b0c9de-58a3-4ad5-8b69-207e18b6b2e4@manitou-mail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Peter Eisentraut wrote:

> Another issue is that we'd need to carefully divide up the role of the
> "default" collation and the "default" provider. The default collation
> is the collation defined for the database, the default provider means to
> use the libc non-locale_t enabled API functions. Right now these are
> always the same, but if the database-global locale is ICU, then the
> default collation would use the ICU provider.

I think one related issue that the patch works around by using a libc locale
as a proxy is knowing what to put into libc's LC_CTYPE and LC_COLLATE.
In fact I've been wondering if that's the main reason for the interface
implemented by the patch.

Otherwise, how should these env variables be initialized for ICU
databases?
For instance in the existing FTS code, lowerstr_with_len() in
tsearch/ts_locale.c calls tolower() or towlower() to fold a string to
lower case when normalizing lexemes. This requires LC_CTYPE to be set
to something compatible with the database encoding, at the very
least. Even if that code looks like it might need to be changed for
ICU anyway (or just to be collation-aware according to the TODO marks?),
what about comparable calls in extensions?

In the case that we don't touch libc's LC_COLLATE/LC_CTYPE in backends,
extension code would have them inherited from the postmaster? Does that
sound acceptable? If not, maybe ICU databases should have these as
settable options, in addition to their ICU locale?

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas Karlsson 2018-12-12 14:58:07 Re: Reorganize collation lookup time and place
Previous Message Bear Giles 2018-12-12 14:30:18 Re: Record last password change