Re: Initcap works differently with different locale providers

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Oleg Tselebrovskiy <o(dot)tselebrovskiy(at)postgrespro(dot)ru>, pgsql-docs(at)lists(dot)postgresql(dot)org
Subject: Re: Initcap works differently with different locale providers
Date: 2025-08-16 21:29:52
Message-ID: CAPpHfdvbib54J8NGcqr=FfrhLeyMFj20AuV1SaBQ_SGme9JnuQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-docs

On Wed, Jul 30, 2025 at 10:58 PM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>
> On Mon, 2025-07-28 at 13:20 +0300, Alexander Korotkov wrote:
> > I can confirm inicap works with libc and libicu as you stated. The
> > documentation patch looks good to me. I’ve written a commit message.
> > The REL_12_STABLE branch is not relevant anymore as it’s out of
> > support. I’m going to push this if no objections.
>
> Apologies for the late review.
>
> First, it doesn't mention the "builtin" provider, which uses the same
> word break rules as libc.
>
> Second, word boundaries can be complex, and I'm wondering if we should
> not be so precise about what ICU does or doesn't do. For instance, ICU
> has options like U_TITLECASE_ADJUST_TO_CASED,
> U_TITLECASE_NO_BREAK_ADJUSTMENT, etc.[1], and I'm not sure exactly
> which one of those we use.

I think none of these options is used, because options could be
processed by ucasemap_toTitle() [1] while we use u_strToTitle() [2]
which takes no options.

Links
1. https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/ucasemap_8h.html#aa49d8b403bd91c52f127fe80679bac11
2. https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/ustring_8h.html#a47602e2c2012d77ee91908b9bbfdc063

------
Regards,
Alexander Korotkov
Supabase

In response to

Browse pgsql-docs by date

  From Date Subject
Next Message Alexander Korotkov 2025-08-17 21:44:44 Re: Initcap works differently with different locale providers
Previous Message Hayato Kuroda (Fujitsu) 2025-08-15 04:46:17 RE: Make pgoutput documentation easier to find