make tsearch use the database default locale

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: make tsearch use the database default locale
Date: 2025-10-07 22:49:55
Message-ID: 0151ad01239e2cc7b3139644358cf8f7b9622ff7.camel@j-davis.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

tsvector and tsquery are not collatable types, but they do need locale
information to parse the original text. It would not do any good to
make it a collatable type, because a COLLATE clause would typically be
applied after the parsing is done.

Previously, tsearch used the database CTYPE for parsing, but that's not
good because it creates an unnecessary dependency on libc even when the
user has requested another provider.

This patch series allows tsearch to use the database default locale for
parsing. If the database collation is libc, there's no change.

Motivation:

(a) it reduces the dependence on setlocale(), which is not thread-
safe;
(b) if a user is using the builtin or ICU providers, understanding
the effects of LC_CTYPE can be very confusing;
(c) it would allow us to test more of the tsearch parsing behavior.

Notes:

* Should have the the exact same behavior as before if the database
locale provider is libc. If the database locale provider is builtin or
ICU, then there will be some differences in tsearch parsing behavior.

* Most of the patches are straightforward, but v1-0005 might need extra
attention. There are quite a few cases there with subtle distinctions,
and I might have missed something. For example, in the "C" locale,
tsearch treats non-ascii characters as alpha, even though the libc
functions do not do so (I preserved this behavior).

* This introduces redundancy between the character isxyz() functions in
recg_pg_locale.c and similar functions in pg_locale.c. It would be easy
enough to refactor to eliminate the redundancy, but that might have
performance implications, so I didn't do it yet.

Regards,
Jeff Davis

Attachment Content-Type Size
v1-0001-Rename-static-functions-pg_wc_xyz-to-regc_wc_xyz.patch text/x-patch 10.9 KB
v1-0002-Add-pg_wc_xyz-exported-functions.patch text/x-patch 19.2 KB
v1-0003-Add-pg_wc_isxdigit-useful-for-tsearch.patch text/x-patch 5.9 KB
v1-0004-Add-pg_database_locale-to-retrieve-database-defau.patch text/x-patch 1.4 KB
v1-0005-tsearch-use-database-default-collation-for-parsin.patch text/x-patch 7.1 KB
v1-0006-Remove-obsolete-global-database_ctype_is_c.patch text/x-patch 2.2 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2025-10-07 23:11:55 Re: src/include/utils/float.h comment one link stable
Previous Message Dean Rasheed 2025-10-07 21:52:04 Re: Allow ON CONFLICT DO UPDATE to return EXCLUDED values