| From: | "Daniel Verite" <daniel(at)manitou-mail(dot)org> |
|---|---|
| To: | "Jeff Davis" <pgsql(at)j-davis(dot)com> |
| Cc: | pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: CREATE DATABASE command for non-libc providers |
| Date: | 2025-06-10 21:44:54 |
| Message-ID: | 352244c2-0f8a-4c8c-9ade-e39718e7e306@manitou-mail.org |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Jeff Davis wrote:
> Even if it's not a collatable type, it should use the database
> collation rather than going straight to libc. Again, is that something
> that can ever be fixed or are we just stuck with libc semantics for
> full text search permanently, even if you initialize the cluster with a
> different provider?
ISTM that what backend/tsearch/wparser_def.c needs is comparable
to what backend/regex/regc_pg_locale.c already does with the
PG_Locale_Strategy, and the pg_wc_isxxxx functions.
Looking at git history, the current invocations of is[w]digit(),
is[w]alpha()...
in the FTS parser have been modernized a bit by ed87e1980706 (2017)
but essentially this code dates back from the original integration of
FTS in core by 140d4ebcb46e (2007). These calls are made through
the p_is##type macro-expanded functions:
/*
* In C locale with a multibyte encoding, any non-ASCII symbol is considered
* an alpha character, but not a member of other char classes.
*/
p_iswhat(alnum, 1)
p_iswhat(alpha, 1)
p_iswhat(digit, 0)
p_iswhat(lower, 0)
p_iswhat(print, 0)
p_iswhat(punct, 0)
p_iswhat(space, 0)
p_iswhat(upper, 0)
p_iswhat(xdigit, 0)
That's why in a database with the builtin or ICU provider and lc_ctype=C,
the FTS parser is not Unicode-aware. I may miss something, but I don't see a
technical reason why this code could not be taught to call the equivalent
functions of the current collation provider, following the same principles
as the regex code.
Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Nathan Bossart | 2025-06-10 21:47:59 | Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward |
| Previous Message | Sami Imseih | 2025-06-10 21:28:15 | Re: add function for creating/attaching hash table in DSM registry |