Re: LOCALE C.UTF-8 on EDB Windows v17 server

From: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
To: "Dominique Devienne" <ddevienne(at)gmail(dot)com>
Cc: "Laurenz Albe" <laurenz(dot)albe(at)cybertec(dot)at>,pgsql-general(at)postgresql(dot)org
Subject: Re: LOCALE C.UTF-8 on EDB Windows v17 server
Date: 2025-06-05 20:57:24
Message-ID: aded3b3d-4849-4bf4-a072-eecf84f5cd4e@manitou-mail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Dominique Devienne wrote:

> So you're saying datcollate and datctype from pg_database are
> irrelevant to PostgreSQL itself, and only extensions might be affects?

Almost. An exception that still exists in v18, as far as I can see [1],
is the default full text search parser still using libc functions like
iswdigit(), iswpunct(), iswspace()... that depend on LC_CTYPE.

So you could see differences between OSes in tsvector contents
in a database with the builtin provider.
Unless using LC_CTYPE=C. But then the parsing is suboptimal, since the
parser does not recognize Unicode fancy punctuation signs or spaces as
such.
Personally I would still care to set LC_CTYPE to a reasonable UTF-8 locale
with v17 or v18.

[1]
https://doxygen.postgresql.org/wparser__def_8c.html#a420ea398a8a11db92412a2af7bf45e40

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Jeff Davis 2025-06-05 21:11:43 Re: LOCALE C.UTF-8 on EDB Windows v17 server
Previous Message PetSerAl 2025-06-05 17:09:21 Re: Combining scalar and row types in RETURNING