Re: CREATE DATABASE command for non-libc providers

From: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
To: "Jeff Davis" <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: CREATE DATABASE command for non-libc providers
Date: 2025-06-10 21:44:54
Message-ID: 352244c2-0f8a-4c8c-9ade-e39718e7e306@manitou-mail.org
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jeff Davis wrote:

> Even if it's not a collatable type, it should use the database
> collation rather than going straight to libc. Again, is that something
> that can ever be fixed or are we just stuck with libc semantics for
> full text search permanently, even if you initialize the cluster with a
> different provider?

ISTM that what backend/tsearch/wparser_def.c needs is comparable
to what backend/regex/regc_pg_locale.c already does with the
PG_Locale_Strategy, and the pg_wc_isxxxx functions.

Looking at git history, the current invocations of is[w]digit(),
is[w]alpha()...
in the FTS parser have been modernized a bit by ed87e1980706 (2017)
but essentially this code dates back from the original integration of
FTS in core by 140d4ebcb46e (2007). These calls are made through
the p_is##type macro-expanded functions:

/*
* In C locale with a multibyte encoding, any non-ASCII symbol is considered
* an alpha character, but not a member of other char classes.
*/
p_iswhat(alnum, 1)
p_iswhat(alpha, 1)
p_iswhat(digit, 0)
p_iswhat(lower, 0)
p_iswhat(print, 0)
p_iswhat(punct, 0)
p_iswhat(space, 0)
p_iswhat(upper, 0)
p_iswhat(xdigit, 0)

That's why in a database with the builtin or ICU provider and lc_ctype=C,
the FTS parser is not Unicode-aware. I may miss something, but I don't see a
technical reason why this code could not be taught to call the equivalent
functions of the current collation provider, following the same principles
as the regex code.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2025-06-10 21:47:59 Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward
Previous Message Sami Imseih 2025-06-10 21:28:15 Re: add function for creating/attaching hash table in DSM registry